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: CONTRIBUTIONS TO THE THEORY OF SEQUENTIAL ANALYSIS. I 


By M. A. GrirsHick 
United States Department of Agriculture 


| PART I ApplicaTIoNns OF SEQUENTIAL ANALYSIS TO THE RANKING OF Two 
PorpuLaTIONS WitH Respect TO A SINGLE PARAMETER. 


1. Summary. Given two populations 7 and 72 each characterized by a dis- 
| tribution density f(z, 0) which is assumed to be known, except for the value of 
' the parameter 6. It is desired to test the composite hypothesis 6; < 62 against 
_ the alternative hypothesis 6, > 6. where 6; is the value of the parameter in the 
| distribution density of ;, (¢ = 1, 2). 
| The criterion proposed for testing this hypothesis is based on the sequential 
probability ratio and consists of the following: 
Choose two positive constants a and b and two values of 6, say 6) and 63. 
| Take pairs of observations 2;q from 7; and x2_ from m2, (a = 1, 2, .. .), in sequence 
J 
and compute Z; = )>, zq where 
az=l 
se = tg | See 
f (X20 ’ 62) f(X1a ’ 6) 


| The hypothesis tested is accepted or rejected depending on whether Z, > a or 
Z, < — b where n is the smallest integer j for which either one of these relation- 
' ships is satisfied. 

The boundaries a and b are partly given in terms of the desired risks of making 
_an erroneous decision. The values 6! and 65 define the magnitude of the differ- 
| ence between the values of 6 in 7, and in wz which is considered worth detecting. 
- It is shown that the power of this test is constant on a curve h(@; , 62) = constant. 


) is a monotonic function of 6, then the test is unbiased in the 


' sense that all points (@, , 62) which lie on the curve h(@; , 62) = constant are such 
s that either every 0, < 6. or every 6: > 6. Fora large class of known distribu- 
‘tions the quantity h is shown to be an appropriate measure of the difference 
“between @, and 6 and the test procedure for this class of distributions is simple 
and intuitively sensible. 

For the case of the binomial, the exact power of this test as well as the distribu- 
‘tion of n is given. 


1.1 General discussion. Consider two processes (populations) 7: and 2 
each yielding a measurable quantity x whose distribution density f(z, 6) is as- 
sumed to be known except for the value of the parameter @. On the basis of a 
Tandom sample obtained from each, it is desired to choose that process which 
"yields the smaller (or larger) 6. That is, it is desired to devise a test which will 
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result in a high probability of accepting 7 if the @ characterizing its distribution 
density is smaller (or larger) than the @ in m2 , a high probability of rejecting 7, 
(i.e. accepting 72) when the opposite is true, and approximately equal probability 
of making one or the other decision if the value of @ in 7 is the same as in 7, 

As an illustration of the type of problem here considered, let us assume that a 
manufacturer is faced with a choice between two competing processes of pro- 
duction, each process yielding an unknown fraction defective p and each entail- 
ing about the same operating cost. Based on the evidence of a random sample 
selected from each, the manufacturer wishes to choose that process which yields 
the smaller fraction defective. If the fractions defective in the two processes 
differ by a significant amount, he will want a test which guarantees a high prob- 
ability of making a correct decision. If, however, the fraction defective in the 
two processes are of approximately the same magnitude, it will be a matter of 
indifference to him which decision is reached. 

The solution given in this paper to the above problem is based on Wald’s 
sequential probability ratio test [1]. The resulting procedure not only requires 
on the average, fewer observations for the same protection than any other test 
(which is always the case with sequential tests of this type) but is also direct and 
simple when applied to a large class of distributions commonly met in practice. 


1.2 Derivation of the sequential test when the existence of a priori probabili- 
ties is assumed. The choice of the probability ratio as a method of discrim- 
inating between the two processes is suggested by considerations of a priori 
probabilities. Let us assume that each process may have either 6! or 62 as the 
value of a parameter @ in its distribution density and that the value 6 is more 
desirable than 62. Let us further assume that there exists an a priori probability 
g: that a process will have 6{ as a parameter and an a priori probability g. = 1 — 
gi that it will have 6: as a parameter. Let the likelihood for n observations 
tu, %2,°**, tin drawn from m be designated by p(n, 212, °:* , Zin, 6) when 
6 is the parameter in 7, , and by p(xu , 212, --* , Zin, 62) when 69 is the parameter 
in m,. Let the likelihoods p(aai , x22, °** , Tan, 6$) and p(ae , X22, °** , Lon, 62) 
be similarly defined for n observations X21 , %22,°** , Xn drawn from m2. Then 


(1.201) Pra » Ti2y ***, Vin, 6) = I] f (Xia, 6), t,J cer 1, 2. 


a=l 


Let B:;, (2, j = 1, 2), be the a posteriori probability that having obtained z;., (a = 





Then 
Gi p(X, Liz, +++, Lin, 9) 

.202 i= 0, eros N 
(I ™ ) Ba gi p(ra, +++, tiny 1) + gep(aa, +++, tin, 09) 
and 

92 P(Xir, +++, Lin, 99) 

1.203 2 = 

( ) Bes ga p(ra, ++, Zin, 1) + ge p(ta, +++) Den, 2) 


for : = 1, 2. 








1, 2,---, ”), that process 7; has 6} as a parameter in its distribution density. 









—_—), =—=mh 2h 2s .. . . 
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In order to decide whether the hypothesis that 6! belongs to the distribution 
density of 7; is more tenable than the hypothesis that it belongs to the distribu- 
tion of a2 , it is only necessary to compare 61 with 6. But if 8 is equal to or 
greater than Bo), the ratio 81/612 must be equal to or greater than Be:/f22 and con- 
versely. For assume that Bn > Bo. Subtracting 682 from each side of the in- 
equality we get Bu(1 — Bu) > Bu(l — Bu). But since 1 — Bo = Be and 1 — By 
= Bn , we see that Bu/Bi2 > Bo/Boe2. Conversely, let Bi1/Bi2 > Bo1/Bo2. Then 
Bu(l — Bu) > Ball — Bu), or Bu > Ba. 

From the above it would appear that a sensible sequential procedure for de- 
ciding whether 6! is more likely to belong to 7, than to 7 is as follows: Select two 
positive quantities A and B with A > landB <1. Takea pair of observations 
(tia 2a), (a = 1, 2, ---), at a time, one from each process. At each step (i.e., 
Be / 8a, If at any stage A < B, 
Boe Biz 
terminate the sampling and accept the hypothesis that 6! is a parameter in the 
distribution density of 7;. On the other hand, if at any stage \ > A, terminate 
sampling and accept the hypothesis that 6} is a parameter of the distribution 
density in 7.. If neither holds, that is if B < \ < A, then take another pair of 
observations, consisting of one from each process. Continue this procedure 
until one or the other decision is reached.’ 

The interesting point here is that the decision function \ is independent of g; 
and g2. In fact, it is easily seen from equations (1.202) and (1.203) that 











for each sample size n) compute the ratio \ = 







0 
°°, Lon, 6;)p(rn, 125 °°*, Lin, 62) 
++, on, O3)p(an, Le, . +, Zin, Of) 


as pra, 225 °* 
p(x, X22; + 


225 


(1.204) 














1.3 The proposed sequential test as a special case of a sequential probability 
ratio test. If we examine the expression given in (1.204) we see that it is a ratio 
of two likelihoods. The numerator of the ratio is the likelihood of the 2n ob- 
servations under the hypothesis that 62 is a parameter in 7 and 6! is a praameter 
in m. ; the denominator is the likelihood of the 2n observations under the hy- 
pothesis that 6? is a parameter in 7; and 62 is a parameter in 72. Thus, the pro- 
posed sequential test is equivalent to a sequential probability ratio test (see [1]) 
for testing the simple hypothesis that 6? belongs to 7 and 62 belongs to 72 against 
the alternative hypothesis that 62 belongs to 7, and 6! belongs to 72. We can, 
therefore, apply the theory of sequential analysis developed by A. Wald ({1] and 
[2]) to this problem. 

While the test is posed in terms of a simple hypothesis, the solution, as will be 
shown later, is in fact a solution to a composite hypothesis. In order to bring 
this out more clearly we shall rederive a few of the results which have already 
been obtained by A. Wald. This will be done in sections 1.4, 1.5, and 1.6. 





















1 That a decision will be reached eventually can be asserted with probability one if the 
variance of the variate z. (defined by (1.301)) below is different from zero (or if it is zero, 
the value of z. is different from zero). See[2],Lemmal. As we shall see later, if, in fact, 
both processes have either 6{ or 0, as parameters, then the above sequential procedure will 
result in the acceptance of either process with approximately equal probability. 
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In what follows we shall speak of the hypothesis (6; , 62) to mean the hypothesis 
that 6; is the value of the parameter in the distribution density of 7; and 4 is the 
value of the parameter in the distribution density of 7.. The hypothesis (6), 
62) will represent a specific hypothesis which we may wish to test and will be 
used to define the decision function (the probability ratio) of the sequential test, 

Let us fix A > land B < 1 and set 


as f(x2u, 91) f (X10, 92) 
re te = Nog | Te Ne | 


where Zia is the ath observation from ™, %2@ is the ath observation from m 
and (6! , 62) is the particular hypothesis to be tested against the alternative hy- 
pothesis (6), 63). Let a = log A and —b = log B. Then aand bare positive. 
Since the observations from 7, and z2 are assumed to be independent, log \ = 


> zz. Hence the proposed sequential test can be carried out in the following 
=] 


manner. Draw one pair of observations at a time, one from 7 and one from 7. 
Let 21, 22, °°: be the values of z, obtained from the first, second, etc. trial. 
Let Zn, =a t+a+::: +2,, (mn = 1, 2,---). Continue sampling as long as 
—b <Z, <a. Whenever Z, > a, (n = 1, 2,3, --- ), terminate sampling and 
accept m2 (or m). Whenever Z, < —b, (n = 1, 2, 3, ---), terminate sampling 
and accept m (or 7). 

1.3a. Basic assumptions. In this section and throughout this paper, we shall 
be dealing with sequential tests involving, as above, a decision function Z, = 
ata+t:::+2,, (mn = 1, 2,--- , ad inf.), where the z,’s are independently 
distributed random variables having a common distribution function. Let z 
denote a random variable whose distribution is the same as the common distribu- 
tion of 22, (a = 1, 2,---, ad inf.). It will be assumed, even if not explicitly 
stated, that the distribution of z satisfies the following conditions. 

ConpiTION I. Both the expected value Ez of z and the variance of z exist and are 
unequal to zero. 

ConpirIon 1. There exists a positive 6 such that P(e’ > 1 + 6) > 0 and 
P(e’ <1-—6)>0. 

Conpirion 11. For any real value h, the expected value Ee’* = g(h) exists. 

ConpiTIon Iv. The first two derivatives of the function g(h) exist and may be 
obtained by differentiating under the integral sign. 

1.3b. Fundamental properties of sequential tests. Let z be defined as in 1.3a. 
Then under the assumption that the distribution of z satisfies the conditions 
specified, Wald [2] has proved the following: 

Lemma I. The probability that a decision is reached in a finite number of steps is 
unity. 

Lemma u. There exists one and only one real value h ¥ 0 such that the expected 
value Ee” = 1. 

FUNDAMENTAL IDENTITY: The fundamental identity Ee*"[o(t)|"" = 1 holds 
for all points in the complex plane for which | ¢(t) | > 1 where o(t) = Ee”. 
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— log S(t: 64) 
Let w = log te #9) 


6, be any two values of @ which may be distinct from 6{ and 62. Then it can 
easily be verified that if w satisfies the conditions specified in section 1.3a under 
the hypothesis 6 = 6, as well as the hypothesis @ = 6, and if moreover the ex- 
pected values of w under these. two hypotheses are not equal, then z = log 


f(xe, O)f(a1, 62) 


fas, f(a, 6°) 62)f (a1, 69) ) will also satisfy these conditions when the joint distribution 
9 ° ly 





and let the distribution density of x be f(z, 0). Let @,and 


density of x, and x2 (x; representing the measurable characteristic in 7 and 22 
in m2) is either f(x; , 61) f(x2 , 62) or f(ax1 , Oo) f(xe , 61). 

In what follows, we shall assume that the distribution of w satisfies the re- 
quired restrictions for the 6; and 6 under consideration and that the expectation 
of w under the hypothesis 6 = 6, is unequal to the expectation of w under the 
hypothesis @ = 6. Consequently, we shall assume that Lemmas I and II and 
the Fundamental Identity hold for all the sequential tests we shall consider. 


1.4 The power of the proposed test. Let x: be an observation from m and 
a an observation from m2. Let 


61)f(a1, 62) 

1.401 2 = log Lt» Wa, 

pam) °8 F (es, O)f(a1 , 6°) 

where 6; and 62 are specified parameters in the probability density of 7; and 72 


respectively. Furthermore, let ¢(t | 6: , 62) = E(e’* | 6:, 62) be the moment gen- 
erating function of z under the hypothesis (6; , @). Then 








‘ 2, O)f(a1, 8 

(1.402) E(e' | 6, @) = [ [ ES : oe >| flr, 0:)f (ae, 62) day dite 
By Lemma II there exists one and only one real number h + O such 
that E(e’* | 6, @) = 1. Let L, = P(Z, < —b | 0, , 0) be the probability that 
the sequential test terminates and Z, < —b under the hypothesis (@; , 6). Then 
by Lemma I, 1 — L, = P(Z, > a| 6, 6). For any random variable u consid- 
ered under the hypothesis (6, , 62), let the symbol E,(u) stand for the expected 
value of wu under the restriction that Z, < —band E,(u) stand for the expected 
value of u under the restriction that Z, > a. In terms of the above definitions, 
the Fundamental Identity can be expressed as follows: 


(1.403) LaEye'7"[p(t | 6: 62)" + (1 — La) Ece’*"[p(t | 61, 62)” = 
Setting ¢ = h in (1.403) we get 
(1.404) LyEy'2" + (1 — L,)E.e"”* = 1. 


Following Wald [2], we define a two valued random variable Z, in this manner: 
Z, = aif Z, >aandZ, = —bifZ,< —b. LetZ,—Z,=«€. Theneisalsoa 
random variable. In what follows, we shall substitute 0 for «. The error com- 
mitted in neglecting ¢ is small when 6} is close to 62. As we shall indicate later, 
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the quantity € can, in fact, be neglected without error in the special case where 
f(x, 6) is the binomial distribution. 
Substituting Z, for Z, in (1.404) we get 


(1.405) Le + (1 — Lae" = 1. 


Solving for L, we get’ 
1 are e” el +b) oie e* 
(1.406) en * ee eae 
As we shall see later, h = 0 when 6; = 6. But when h = 0, Ly in (1.406) is 
indeterminate. However, it can be easily seen that 


(1.407) lim L, = 


a 
ho a + b° 


It follows from (1.406) that the power of the test is constant for all 6, and 6, 
which give the same root ¢ = h. The quantity A is thus fundamental in this 
test, and as we shall see later, is an appropriate measure of the difference between 
6, and 6 for a large class of distributions. 


1.5 Method of determining the sequential test. Let z be defined as in 
(1.401) and let ¢:(é) = E(e'’| 6, 62) be the moment generating function of z 
under the hypothesis (6?, 62), and let ¢2(t) = E(e‘*| 62, 6:) be the moment gen- 
erating function of z under the hypothesis (63, 6:). Furthermore, let a = P(Z, 
= a| 6}, 6) and 8 = P(Z, = —b| 62, 61). Then by LemmalI, 1 — a = P(Z, 
= —b| 6, 6) and 1 — 8 = P(Z, = a| 62, 6:). Now, applying Wald’s Funda- 
mental Identity we have, 


(1.501) (1 — ale “Evlo())" + ae Erlgi(t)-” = 1, 
(1.502) Be" Esslgo(t)" + (1 — B)e"Esalde(t)-” = 1, 


where the symbol E,, stands for the conditional expectation knowing that Z, = a 
and Ey, stand for the conditional expectation knowing that Z, = —b; with both 
expectations taken under the hypothesis (6{, #2). The symbols E., and Ex 
are similarly defined but under the hypothesis (6), 6!). Setting ¢ = 1 in (1.501) 
and ¢ = —1 in (1.502), we get, in view of Corollary 2, Theorem 2 below, 


(1.503) (1 — ale” + ae* = 1, 


(1.504) Be’ + (1 — B)e*® = 1. 


2 In what follows, L, will always stand for the probability that a sequential test will 
terminate with Z, < —b. Inany given problem, the interpretation of the event Z, < —) 
will be clear from the context. 
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Now a = log A and —b = log B. Hence, equations (1.503) and (1.504) become 
(1.505) (l-—a)B+aA 





(1.506) 
or 


=f 


? 
Qa 


(1.507) A 


| 





1-8 
—— and a = log 
a 







(1.508) B= B and b lew * an 
l—a B 




















From (1.507) and (1.508) we see that the sequential test is completely determined 
by the function z, which, in turn, is defined by 6} and 62 , and by the probabilities 
of making a decision for the two hypotheses (6; , 65) and (62, 6). 

Once z is defined in terms of a specific (6), 6), the probability that Z, < —b 
will be equal to 1 — a and the probability that Z, > a will be a@ (if we neglect 
the fact that | Z, |, at a decision point, might exceed a or b) for the totality of 
hypotheses (6; , 4) for which the moment generating function (¢| @:, 62) = 1 
when? = 1. A similar statement can be made for the corresponding hypotheses 
(6 , 6:) for which the moment generating function will equal unity when ¢ = —1. 
Hence, we see that while the test is defined by specifying two points (6) , 6°) 
and (62, 6;) in the parameter space, the pre-assigned risks a and 6 of making 
the correct decision will be approximately constant on the set of points for 
which the moment generating function equals unity when ¢ = 1 and when ¢ = 
—1, respectively. This set of points usually will constitute a smooth curve. 


a ' an ’ 
If 6, = 6, Lo = cae (by 1.407). Hence, the probability of accepting m 


will be close to 3 if a is close to b, and will equal 4 if a = b. But from (1.507) 


and (1.508) we see that a = bif a = 8. Thus, if we construct a test which 
will give a probability of rejecting 7 when (6{ , 62) is true equal to the probability 
of accepting 7 when (62, 63) is true, we shall be accepting 7 and 2 with equal 
frequency when in fact 6; = 6. 











1.6 The average number of pairs of observations required to reach a decision. 
Let E(n | 6; , 62) be the expected number of pairs of observations required to reach 
a decision under the hypothesis (6; , 6). We shall show that 













a(l — Ln) ) — bli 
Ez ; 





(1.601) E(n | 61, 0) = 










Proor: Differentiating the Fundamental Identity, 
(1.602) Ee'*"o())” = 1, 
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with respect to ¢, we get* 

(1.603) E{Z,c'"o()" — ne'**o'(oT ""} = 0. 
Setting t = 0, we get 

(1.604) EZ, — ¢'(O)E(n | 6, 6) = 0. 

But 

(1.605) EZ, = a(l — La) — bLy 

and 

(1.606) ¢'(0) = Ez. 


Hence, solving for E(n | 6, 62) in (1.604) and substituting from (1.605) and 
(1.606) we get 
(1.607) E(n|0,, 2) = ~ — fe = we , 
Ez 

While L;, is approximately constant for all values of (6; , 6) for which the mo- 
ment generating function equals unity for ¢ = h the expected value of n given by 
(1.607) will depend on the particular hypothesis (@; , 62). This follows from the 
fact that Ez is not necessarily constant for the same set of points (@; , 6) for which 
L, is constant. 


1.7 Some general properties of the proposed test. 
THEOREM 1. Let z = log S(az, Ofer , 62) where x, 1s an observation from ™ 
S(xz , O2)f(%1, 61) 
and x2 from mm. Then if F(z) is the distribution density of z under the hypothesis 
(0, , 02), F(—z) is the distribution density of z under the hypothesis (62, ;). 
Proor: Let ¢ be a real number and let y(t) = E(e'’ | 6, , 6) be the character- 
istic function of z under the hypothesis (6, , 6). Then 


- _ ff” f° [ fee, f(a, 62) | 
(1.701) x(t) = [ [. | Se f(a, 2 | f(a, 61) f (2x2, 62) dx, dxe. 


Now let y(t) = E(e~‘ | 6, 6:) be the characteristic function of —z under the 
hypothesis (42, 6:). Then 


- we. ae f(x2, O:)f(ar, 62) on F 
(1.702) Yo(t) — [. [. | fee 0) f(x, | f(a ; 62)f (x2, 6,) dx, dx. ° 


Interchanging the variables of integration in (1.702) we see that y(t) = y(t). 
Consequently, the distribution of z under the hypothesis (4, , 6) is the same as 


3 This assumes that the Fundamental Identity can be differentiated with respect to t. 
The results that follow can be derived without any reference to the Fundamental Identity. 
See Wald [1], page 142. 
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the distribution of —z under the hypothesis (@ , 6:). This theorem in conjunc- 
tion with the fact that E(z | 6:, 6) # 0 when 6; ~ 6 shows that the decision 
function z discriminates in a real sense between the two alternative hypotheses 
(0; , 0) and (@, @). 


THEOREM 2. Let E(e™ | 6; , 6) be the moment generating function of z under the 


hypothesis (6, 0) and let E(e‘ | 6, 6:) be the moment generating function of z 
under the hypothesis (62, 0). Then, ot = = his a root of the equation E(e“ | 6, 62) 
= 1, thent = —hisa root of the equation E(e"* | 6, 6) = 1. 


Proor: The same as Theorem 1. As we have seen in Section 1.4, the power 
of the proposed sequential test (neglecting «) depends only onh. This theorem 
shows that if the probability of accepting 7; is large under the hypothesis (6, 
6), it will be small under the hypothesis (6: , 6:), and conversely. 

Coro.uary 1. The only value of t for which E(e" | 6, 0) = list = 0. This 
follows from Theorem 2. 

Corotiary 2. The values of t for which E(e“ | 6) , 62) = 1 and E(e‘ | 62, 6) 
= laret = landt = —1 respectively. This can be seen ™ expressing E(e | 61 , 
6:) as a double integral and setting ¢ = 1. 

THEOREM 3. Let w be the totality of points (6; , 62) in the parameter space for 
which 0, < 62. Then a necessary and sufficient condition that the values of h (for 
which E(e'’ | 0; , 8) = 1) be of the same sign for all points in w is that 


(1.708) Ew \6 -[ lo ee 2) fe , 6) dz 


be a monotonic function of 0. 

To prove this theorem we need the following lemma. 

LemMaA 1. Let g(x, 0) be the distribution density of x and y(t) its moment gen- 
erating function. Let h be the real non-zero value of t for which Y(t) = 1. Then 
the sign of h is opposite in sign to Ex (the expected value of x) if Ex # 0. 

Proor: For any random variable u, Wald [1] has shown that the inequality 


(1.704) Eu < log Ee“ 
holds. 

Setting u = tx, where ¢ is a constant, we get 
(1.705) tEx < log Ee = log y(t). 


Setting ¢ = h in (1.705) we get hEx < 0. This proves the lemma. 
Now let E(z | 0: , 62) be the expected value of z under the hypotheses (6; , 62) 
where (@,, 62) belongs to w. Then 


E(z |, 6) = [. [v0 log ft» flrs 82) (a, 0,)¢(05, 04) dard 





5 f(a, 62)f (21, 69) 
(1.706) ” [10 og ? f(a, 01) dz 





f(x, 2) a o 
- [. loB i 9°) f(z, %) dx = Ew|6, — Ew| 6. 
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From (1.706) we see that if Ew | 6is monotonic in 6, E(z | 6; , 62) will have a con- 
stant sign for all points (@; , 6) in w and hence by Lemma 1, h will have a constant 
sign. Conversely, if h is of constant sign for all (@,, 4) in w, so will E(z| 6, 
6) be. Consequently, by (1.706) Ew | 6 must be monotonic. ; 

Coro.iary 1. Let Ew | 6 be a monotonic function of 6 and let w,, (h # 0), be 
the totality of points (0, , 62) in the parameter space for which the power of the se- 
quential test is constant. Then the coordinatss of the points (6; , 62) in wr are such 
that either every 0, < @2 or every 0; > @. 

Proor: By assumption all points in w, have the same power. Since L,;in 
(1.406) is a strictly increasing function of h, the points in w, must yield the same 
h. However, if we assume that w, contains a point (6; , 62) with 6; < 6; anda 
point CA : 65) with 6; > 65 , the sign of E(z | 6; , 0) by (1.706) will be opposite 
to the sign of E(z | 6; , 65). Hence, the value of h yielded by (6; , 45) is opposite 
. in sign to that yielded by (6) , 62), which contradicts the assumption that both 
points yield the same h. 

Theorem 3 and Corollary 1 show that if Ew | @is monotonic in @, the proposed 
sequential test is unbiased in the sense that all points (6; , 62) that lie on the curve 
h = constant (and hence have the same power) will have the property that 
either the inequality 6; < 4 holds or the inequality 6; > 6 holds. The equality 
sign will hold if and only if h = 

1.8 The proposed test applied to distributions which admit sufficient statist 
tics. Let f(x, 6) admit a sufficient estimate of 8. Then it is well know n tha 
f(z, 6) can be written in the form‘ 


(1.801) f(z, 6) si gtislo®) trl) te) 
f(a , 6! f(a ‘ 65 
H (x2, 03)f (ar, 6)’ 
decision function assumes the simple form: 

(1.802) 2=[u(x2) — u(2xr)][v(61) — v(6%)]. 
ate Mais 
v(61) — v(62)” 


Setting z = log , we see that for this class of distributions the 


Let a* and b* = Then the decision function 


iicneiapillaaeainil 
~ v(61) — v(62) 
becomes 
(1.803) 2* = u(x2) — ula). 

We shall now show that, for this class of distributions, the power of the sequen- 
tial test is a function of v(@,) — v(@). To prove this, it is only necessary to show 
that E(e'* | 6: , 0) equals unity for ¢ = v(@,) — v(@). Now 


Bee | 0,0) = [ enn f(r, O)flve, 6) drs dr 
(1.804) 


2 
“i | [ et (ti) )(v(0,)—t]4 u(re) [t+v (69) ]+r(xz1)4+r(rq)+w (0; )+u (89) dx; die. 
—20 %—00 


If we set ¢ = v(6;) — v(@) in (1.804), we see that the statement is proved. 


4 See, for example, [3]. 









on 


Xe ‘ 
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Let En | h be the average number of pairs of observations required to reach a 
decision when v(6@,;) — v(@2) = h. Then by formula (1.607) we have 


Efu(xe) — u(ar)} to Efu(are) — u(r) 
Since the expected value of u(x) will not necessarily equal v(6), the average num- 
ber of pairs of observations required to reach a decision will depend not only on 
v(@:) — v(@) but, also on the particular hypothesis (6; , 6) considered. 

Since the power of the test for this class of distributions depends on v(6,) — 
v(@), it will be constant for all 6, and @ which lie on the curve defined by v(6,) 
— v(@) = constant. In particular, if the sequential test is defined with risks a 
and 8, the probability of accepting 7, (or m2) will be approximately a for all 
hypotheses (6; , 62) which lie on the curve defined by v(@,) — v(62) = v(6?) — 
v(02) = ho and the probability of accepting 72 (or 7) will be approximately 6 for 
all hypotheses (6: ,'6:) which lie on the curve defined by v(62) — v(@:) = ho. 
Now, the decision function z as well as the boundaries a* and b* will be identical 
for all sequential tests provided they are defined by the same risks a and 6 and 
the parameters @, and @ which determine the decision function all lie on the 
curve v(6;) — v(@) = ho. Since Wald [1] has proved that the sequentia! proba- 
bility ratio test minimizes E(n), the expected number of observations required 
to reach a decision, when the hypothesis tested is true as well as when the 
alternative hypothesis is true, it must follow that in the case under consid- 
eration E(n) is minimized for all hypotheses (6; , 62) which lie either on the curve 
defined by v(6;) — v(@2) = ho or on the curve defined by v(6@2) — v(6;) = ho. If 
v(@) is a monotonic function of 6, then the test is unbiased (i.e. all points (6; , 62) 
which lie on the curve v(6;) — v(@2) = constant will have the property that either 
every 6; < 6 or every 6; > 62). 

For this type of distribution, the importance of the difference between 6; and 
6. may be measured by v(6@:) — v(6). We shall now show that the function 
v(@:) — v(@) is an appropriate measure of the difference between these param- 
eters for a wide class of distributions which often occur in practice. 


(1.805) E(n{h) = © — a) — BP Ly _ (1 — La) log A + Ly log B 


1.9 The proposed test applied to known distributions. 
1.9a. The problem of discriminating between means when the variances are known. 
Let f(x, uw) be a normal distribution function with unknown mean yz and known 
variance o° which we shall assume, without loss of generality, to be unity. Let 
a, be an observation from 7, and x2 an observation from m.. Let the distribu- 
tion density of x, be designated by f(x , w:) and that of x. by f(x2-u2). The prob- 
lem is to decide which process has the larger yu. 
Since f(z, u) is a normal distribution, it is given by 


ie i —h(x—p)2 
(1.901) f(z, uw) = \/2n * . 


Hence f(z, u) is of the form considered in Section 1.8 with u(x) = x and v(u) = 
u. Therefore, the decision function is given by 


(1.902) 2 = %o2—- X11 
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and the power of the test depends on h = wi — ye and is given by (1.406) with 
a and b replaced by a* and b*, respectively. 
The sequential test is performed in the following manner: We take a pair of 


n 
observations, one from 7 and one from 72, in sequence. Ifatany stage >> (22, 


a=l 


— Lia) < —b*, we accept the hypothesis that 7 has the larger mean. If, how- 


ever, at any stage >> (22a — tia) > a*, we accept the hypothesis that 7 has 


a=l 
the larger mean. If neither holds, we continue sampling. According to section 
18 ¢* = mat and —b* = me , Where yu; — we is assumed to be positive. 
Mi — Me Mi — Pe 

In order to determine a sequential test, we must fix a* and b*. That is, we 
must fix the quantities u4. — w., A, and B. This can be accomplished by de- 
ciding: (1) the smallest difference between the means of the two processes which 
is considered worth detecting. This determines ho = ui — u2, which we shall 
assume to be positive: (2) the maximum probability a of rejecting the hypothesis 
that 7, has the larger mean when in fact 4; in 7; differs from ye in 72 by as much 
as ho; and (3) the maximum probability 8 of accepting the hypothesis that 7, 
has the larger mean when in fact the difference between yu; and pp is as large as 
ho negatively.” When a and 8 are fixed, A and B are determined by equations 
(1.507) and (1.508). 

1.9b. The problem of discriminating between variances when the means are known. 
Let us assume that the distribution of 2; in 7 and 22 in 72 are normal with known 
means but unknown variances. We are required to choose that process which 
has the smaller variance. Without any loss of generality we shall suppose that 
the means of x; and 2, are zero. Since f(x, a) is normal, it is given by 


ales e7 (7/26?) _ @ (27/20?) logs./ox 
/ 2x0 


which is of the form considered in Section 1.8 with u(x) = 2 and V(c) = 


(1.903) 


re 
Hence the decision function z* is given by 
(1.904) =a — ai 


and the power of the test depends on h = 3(02° — o1°) and is given by (1.406) 
with a and b replaced by a* and b*, respectively. The sequential test is per- 
formed in the following manner: We take one pair of observations at a time, one 


n 
y ° ° 9 2 _ 
from 7 and one from 7. We continue sampling as long as >> («3a — Zia) lies 
a=l 


between —b* and a*. Whenever >.(x3a — tia) > a*, we conclude that o3 > o}. 


a=l 


5 The power curve defined by (1.406) is a monotonic function of h = yw; — we. Hence the 
probability of rejecting the hypothesis that 7; has the larger mean is < a whenever m: — #2 
> ho. Thus a is in fact the maximum risk of making an erroneous decision. A similar 
statement can be made concerning the risk 8. 
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Whenever » (13a — Lia) < —b*, we conclude that 03 < oj. The quantities 
a=l 
a* and b* are defined by 










log A 


~ $(08)? — (ot)? 






Sh ie acamsag tas 
~ $l(2)* — ty T° 
Thus a* and b* are defined by a specific value of 2” — oj’ and A and B. If we 


B and B = = 
l-—a a 
bility of concluding that oj < 0; when in fact o2” — 01° = —[(o3) ° — (o!) “Jand 
Bis the probability of concluding of < o2 when in fact 02° — o;° = [(02)° — 
(oi) ’}. 

1.9c. The problem of discriminating between variances when the means are un- 
known. Let the measured characteristics in 7; and 72 be assumed to be normally 
distributed with unknown means and unknown variances. We desire to choose, 
on the basis of a sequential test, that process which has the smaller variance no 
matter what the means are. This will be accomplished by reducing the problem 
to that treated in Section 1.9b. 





take (a2) — (o}) as negative, then A = + hema @ proba- 













Let 21, Z12, M13, -** be the successive observations from 7, and 2X2; , X22 , X23 , 
--+ the successive observations from z;. Consider the transformation 
1 1 
a = /2 T11 — /2 V2, 


1 1 2 
Yu = V23 7 + V3 7” - 237% 


















1 n—1 
Yun—1) ia wn. oO + n(n — 1) 0 1) te +03 = /n(n — 1) Lin, 









with yo, Yor, *** Yan—1) *** Similarly defined in terms of 2 , 422, °** Fon °° 
It is obvious that this transformation can be applied sequentially. Moreover 
it is easy to show that 
(1) The expected values of the y’s are zero. 
(2) The variances of the y’s are the same as the variances of the 2’s. 
(3) The y’s are normally and independently distributed. 
Hence we can apply the sequential test developed in Section 1.9b to the y’s 
without any alterations. The decision function Z * will be given by 


(1.905) Zz. = x (ysa — Yia). 


? 
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But it can be easily shown that 






n+1 


> Ya = Dy (tea — F2)° 


a=1 a=l 






n+1 
D (tia — 41) 
a=l 





where Z; and Z, are the arithmetic means of the observations in m and 72 respec- 
tively. Hence (1.905) is equivalent to 












n+1 n+1 
(1.906) Z* = L (24 — 32)” — 2X (t1a — 41)’. 





Thus, to perform this sequential test, the population means need not be known. 
The only difference between the tests considered in 1.9b and 1.9c is that 1.9¢ 
requires one additional pair of observations.° 
1.9d. The problem of discriminating between means when the variates have a 
—m} 71 


Poisson distribution. Let the distribution of x; in 7 be given by a — and 
1: 














m 





a — ‘ e ”*m2” 
the distribution of x2 in m2 be given by ————. where 2; and 2, each take on the 
X2 


values 0, 1, 2,---. It is desired to test the hypothesis that the mean in 7; 
is smaller than the mean in 72 against the alternative that the reverse is true. 
Since the Poisson distribution can be written as 







rlogm—m 


(1.907) f(x, m) = <e : 








it is of the form considered in Section 1.8 with u(x) = x and v(m) = log m. 
Hence the decision function z* is given by 







2a =%m— 2% 


and the power of the test depends on h = log ~~ . The sequential test is per- 
2 


formed in the following manner: We take one observation from 7; and one from 






m2 in succession. If at any stage >> (tea — Zia) < —b*, we conclude that m 
a=1 





is smaller than m,. If > (tee — Lia) > a*, we conclude that m, is smaller than 


a=l 


m,. If neither holds, we take another pair of observations. This process is 












6 The method employed here was discovered independently by Charles Stein and the 
author as a solution to a different sequential problem. 
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continued until one or the other decision is reached. The quantities a* and b* 
are given by 










B 

(1.908) oe i 3 

log wp 

; . _. log 1-6 
(1.909) b* = a 


: log Uy 


os 6 . : . ° 
where wu = m;}/m} which is assumed to be less than one, a is the desired proba- 
oqe . : e : 0 0 
bility of concluding that mz: is smaller than m, when in fact m;/m;z = uw < 1, and 
8 is the probability of concluding that m, is smaller than m: when in fact 


0 / 7 . ° 
m;/m; = 1/uo. The power curve is given by 
a*+o* b* 
: U a 
( = ee 
(1.910) Lu qye*tb* ai 1 ’ 









where wu = ™,/me. 

1.9e. Double dichotomies.’ We are given two processes 7 and 7 , one yielding 
a fraction defective p; and the other p,. We shall assume that p: and ps are 
unknown. We desire to choose on the basis of a sample that process which gives 
the smaller fraction defective. That is, we wish to devise a test which gives a 
high probability of accepting m if pi < p. and a high probability of accepting 
m if po < pi. If pi = po, we might be more or less indifferent as to which 
process we select. 

Before we can answer this question, we must decide: (a) the minimum differ- 
ence between the two processes which we consider worth detecting; and (b) 
if the two processes differ at least by the amount specified in (a), the minimum 
probability with which we desire to make the correct decision. 

In the proposed test, the decision function is given by 2* = x2 — 2, where 
a;, (¢ = 1, 2), takes on the values 0 or 1, depending on whether the ith process 
yields a nondefective or defective item. The difference between the two proces- 
ses is measured by* u = i = / — (the ratio of the odds). It caneasily be 

~ ~ pa 
seen that. when u < 1, p: < ppandwhenu>1l1,m>m. Ifu=1,m= pm. 
Let uy represent a quantity less than 1. Furthermore, let a be the probability 



























uy 
: of accepting 2 when in fact the point (p1, pe) lies on the curve Pits = wy ; and 
qip2 

8 be the probability of accepting 7, when in fact the true point (p1, ps) lies on the 
n i i inlet aria 
. 7 For a solution of a more general problem in double dichotomies using a different 
1S : a 

approach, see [1], section 5.32 and [4] section 3. 
8’ This follows from the fact that the binomial distribution can be written as f(x, p) = 

ne eloa(p/g)+logg where x takes on the values 0 or 1. Hence the distribution is of the form 


considered in section 1.8 with v(p) = log p/q, w(p) = logg, andz* = 22-2. 
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Pagi _ 
Qep1 


curve U. Once uw, a and 6 are chosen, we compute 


a 
ae 


(1) a* 
log Uo 


i-f 
oa. OS. 


log uw 


We then proceed as follows: We take one item from each process in sequence 
and cumulate the number of defective d; in process 7 and d2 in process mm. 
Whenever d. — d; < —b*, we choose process 7.. Whenever d; — d; > a*, 
we choose process 7;. Whenever d, — d; lies between a* and —b*, we take 
another pair of observations, one from each process. This procedure is con- 
tinued until one or the other decision is reached. 

1.9e1. The exact value of the power function for double dichotomies. Since 
d, — d, changes at most in steps of one unit, it must follow that whenever a de- 
cision is reached at a*, the difference between a* and d. — d, is either zero (if 
a* is an integer), or the difference between a* and d, — d; is constant for all 
values of n. A similar argument holds for b*. This permits us to compute the 
power function without any approximations. Let a be the next positive integer 
larger than a* if a* is not an integer, and d = a* if a* is an integer. Let b be 
the next positive integer larger than b* if b* is not an integer, and 6 = b* if b* is 
an integer. Then we see that the equation (1.406) for the power curve can be 
given without any approximations by the formula 


(1.9101) Ly = (u®*® — u°)/(u**® — 1) 
1.9e2. The exact average sample number for double dichotomies. Let Z, = 


dz — d; and let the point (p;, pe) be on some curve = =u. Let E(n| pi, po) be 


1 
the expected number of pairs of observations required before a decision is reached. 
Let L, = probability of reaching —b (i.e., L,, is the probability that 7 is ac- 
cepted). Then 1 — L, is the probability of reaching 4 (ie., 1 — L, is the prob- 
ability that m is accepted). Then by Wald’s Fundamental Identity we have’ 


(1.911) EZ, = EzE(n | pi, peo). 
Now, Ez = po — pi, and EZ, = —L,b+ (1 — L,)a@. Hence 


(1.912) E(n|p:, 2) = aa af 
_ 1 


® For a derivation of formula (1.911) which does not depend on the Fundamental Identity, 
see Wald [1], page 142. 








ity, 
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It will be noted that while L,, depends only on u = , E(n | pi, po) depends not 
241 


only on the ratio of the odds but also on the difference between the two fraction 
defectives. 

1.9e3. The distribution of n for double dichotomies. In this section we shall be 
concerned with the probability of reaching a decision with exactly n pairs of 
observations. 

Let a and b be two positive integers and let the sequential test be defined by 


the decision function Z* = >> z, where z, takes on the values — 1, 0, and 1 with 


a=l1 


probabilities P,;, P2, and P3;, respectively. In terms of double dichotomies, 
Z. = dz — d, where d and d; are the cumulative number of defectives obtained 
sequentially from 7 and 7m, respectively, and P; = pige, P2 = pipe + ig, 
P; = pogi, where py, is the fraction defective yielded by 7; and p. the fraction 
defective yielded by ze. 

By the Fundamental Identity we have for any ¢ in the complex plane for which 
lo) | > 1, 


(1.913) Lue “E,leo()." + (1 — Lue*Exfo()T" = 1 








where L,, is the probability that Z * = —b when pi and pe are such that a = U, 
3 
E, and E, are the appropriate conditional expectations, and 


(1.914) g(t) = Pye’ + Po + Pee’. 


If we examine Wald’s proof of Lemma II [2], we see that ¢(¢) > 1 for all real 
values of ¢ which lie outside the open interval (0, h) where h is the root of the 
equation ¢(t) = 1. Hence, it must follow that the Fundamental Identity (1.913) 
must also hold for all real values of ¢ with the possible exception of the open in- 
terval (0, hk). This fact will be used in the subsequent discussion. 

We shall first obtain the distribution of n when a = «. From equation 
(1.910) we see that when a approaches ~, L,, approaches 1 for u > 1 and wu’ for 
u <1. We shall assume that u > 1. Then for ¢ negative and a = ~, the 
Fundamental Identity (1.913) becomes 


(1.915) e “Elo()." = 1 
or 
(1.916) Elg(-" = e”. 

Now for all u > 1, P; > P3, and hence Ez = P; — P; is negative. Since the 
real roots of ¢(¢) = 1 are opposite in sign to Ez, it must follow that (1.916) holds 


for all ¢ in the interval (— ~,0). Nowsete’ =z. Then (1.916) can be written 
as 













(1.917) E(P, : 4+ PrtPyt)" = 2 
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and (1.917) is valid for all x in the interval 0 < 2 < 1. 
Now set 


(1.918) A +s eo! 

x T 
Then for any specified value of 7 there will be two values of x, say 2:(7) and 22(r). 
As 7 approaches 0, one of these values of x will approach zero and the other 
infinity. Let «:(7) be the value of z in (1.918) which approaches zero as + 
approaches zero. Substitutinig (1.918) in (1.917) we get 


(1.919) Ir” = [x(7)]’. 


But Er” is the generating function of n. Hence if we could expand Er” as a 
power series in 7, then the probability Z * = —binexactly n steps would be given 
by the coefficient of 7". We are thus led to consider the expansion of [x(7)]’ 
in a power series in 7. 

We multiply (1.918) by 7a and get 


(1.920) a = 7(Psx” + Por + Pi). 


. : b 
Then since x;(7) approaches 0 as 7 approaches 0, we can expand [x,(7r)]’ by La- 
19 
grange formula, and get 


m m—1 
bd 


m! dé! 


(1.921) (2? "(Pi + Pot + P3£)")e~0 

where the expansion is valid for 2;(7) sufficiently close to zero. Hence, if P,,(b) 
is the probability that exactly n pairs of observations are required to reach a 
decision, then 


b ad” 


§ P,(b) = — —— 
(1.922) Wie oes 


(e’ "(Pi + Pot + Psé’)"|en0. 


Now 


n—l 
<_ en + Re + AE 
(1.923) ” lia 


dgér—! 


n n—l . 
n! ‘ (n — 1)! eee sitet 
ee a Emer Hy 
2. a!(n = r)! , imo Ji(n —-1ir- 2)! a g leno 


But 


~s acide 
as BTSs TS ane 
(1.924) = L 0 


unlessn = n+ 7i—j+),ie.,7 = i+ b, in which case 


— i . 
(1.925) mo = (n — 1)! 


der t=0 


10 See, for example, Mathematical Analysis, Vol. 1 (paragraph 189), by Goursat-Hedrick. 
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Also, since the subscript 7 ranges from 0 to n — 7, it must follow that 7 < n — 7. 


Hence, i + b <n —i,ori < 1. Substituting (1.924) and (1.925) into (1.923) 





and simplifying, we get for P,(b) 


5 > (n — 1)!Pi Py” P3 








‘ P,(b) = wa pe 
(1.926) 6) = >b2. Gt Din —- RAD! 
where m = — when n — bisevenandm = ——- when n — b is odd. 


We shall now obtain the distribution of n when a is finite. 
As before, let 2;(7) and x2(7) be the roots of the equation (1.918). Then from 
(1.913) we have 


L.{xi(7)) Ear” + (1 — Lu) [xa(r)]*E27” 















(1.927) 


1, 





(1.928) L,|xe(7) E13" + (1 a Lu) [xe(7)]“E2r” 


1. 


Solving for Er" and E27” from (1.927) and (1.928) we get 





, _n _ [ai(r)x2(r)]"[xe(r)* — a1(r)’] 
(1.929) L.A = ar) — airy 


b b 
(1.930) (1 — Ly)Eyr? = —_— a@) 


X2(7)2+ iis aX(7r)** A 





We shall first obtain the probability Q,(b) that Z * = —b. Thisis given by the 
coefficient of 7” in the expansion of L,E£r” in a power series in 7. From (1.918) 


we see that 2;(7)%2(7) = = . Hence we can write (1.929) as 
3 


‘p.\e . 
u(r)” — (F) ay(r)"* 
i b+a 
© ae 3) axy(7)2+24 
1 


Applying Lagrange formula, we get for Q,(b) 


(1.931) L, ir” = 


iy os 


(1.932) Qn(0) = oy Geni 


[(P: + Pot + Ps)" f'(Sleno 





where 


_e- (je 
(1.933) f(g) = 1- (BY oe 





142 M. A. GIRSHICK 


But f(€) can be expanded in a power series in &, 


(1.934) j® = . (ey pitetb+ ka on ey por tind +(2k+2)a 
; k=o \P, : P, ; 
Hence | 
i < P,\# tke 
Qn(b) = — Do [(2k + 1)b + ka] (5 
n! k=o P, 
re ‘ oe ¥ 
é apo ee 1 (P, + P2é 4 P; £’) leno 


kb+(k+1 
ay ( a 
P, 


(1.935) 
1 oo 
— 5 oy [2k + Lb + (2k + 2)al ( 
a 
. dé"—! 
Comparing (1.935) with (1.922) we see that 


gtttnetcmtte-4 (Pi + Pot + P3 "leno. 


b+a 


(1.986) Qu(b) Patb) — (FY Pb + 20) + (FP) PalBb + 2a) — =~, 


the terms in the series being alternately of the form 


, 


yo P,[(2k + 1)b + 2ka] and 
1 


- err P,[(2k + 1)b + (2k + 2)al, fork = 0,1, --- 
1 


The series stops by itself as soon as the argument of P, becomes greater than n 

If we compare (1.930) with (1.929), we see that the probability that Z. =a 
with exactly n pairs of observations is given by (1.936) with a and b interchanged 
and the result multiplied by (P3/P,)’. 

It is to be noted that the problem of double dichotomies is similar to the fol- 
lowing problem in games of chance. Two players A and B, possessing a and } 
dollars, respectively, are playing a game of chance which admits a draw. The 
stake is one dollar per game. The probability that A will win one dollar is 
P, , the probability that B will win one dollar is P; and the probability of a draw 
is P,. In terms of this game, L, given by (1.910) is the probability that B 
will be ruined in the long run, and Q,(b) in (1.936) is the probability that B will 
be ruined in exactly n games. 

For a discussion of games of chance which do not permit a draw, see Introduc- 
tion to Mathematical Probability, Chapter VIII, by J. V. Uspensky. The develop- 
ment presented above is in some respects similar to that given in Uspensky’s 
book. In Part II, we shall give a different and more general approach to the 
problem of deriving the distribution of n for sequential tests in which the variate 
takes on a finite number of integral values. 
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AN APPROACH FOR QUANTIFYING PAIRED COMPARISONS AND 
RANK ORDER! 


By Louis GuTTMAN 
Cornell University and War Department 


1. Summary. Research for the Army demobilization point system evolved 
a new approach to paired comparisons and rank order. Each of N individuals 
compares or ranks n things; the problem is to determine a numerical value for 
each of the n things that will best represent the comparisons in some sense. The 
new criterion adopted is that the numerical values be determined so as best to 
distinguish between those things judged higher and those judged lower for each 
individual. Least-squares is employed in the analysis, and the solution appears 
in the form of the latent vector associated with the largest root of a matrix ob- 
tained from the comparisons or rankings. 

This approach applies to the conventional problem of ordinary comparisons, 
the numerical solution being easily obtainable by simple iterations; the conven- 
tional use of hypothetical variables and unverified hypotheses is avoided. The 
Army point system is an example of a new and more complicated class of prob- 
lems; the same principle for the solution applies here, only more details occur 
in the derivations and computations. 


2. Introduction. The problem of paired comparisons arises when it is desired 
to obtain numerical values for a set of n things, with respect to one characteristic, 
such that these values will represent the judgments of a population of N in- 
dividuals. 

One procedure for obtaining the judgments is to have the individuals compare 
the things two at a time and to judge for each comparison which of the two 
things should be given the higher rank. An alternative procedure is to have 
each individual rank all the n things simultaneously. Such a ranking implies 
judging all the n(n — 1)/2 comparisons at once; hence, the two procedures are 
substantially equivalent. Two noteworthy differences between the procedures 
are: (a) comparing two things at a time allows inconsistencies to appear within . 
judgments of an individual, and (b) it is sometimes harder in practice for people 
to judge n things simultaneously than to compare them two at a time. 

The problem of quantification, of course, is identical for both procedures, so 
we do not distinguish between them in this paper. The judgments vary from 
person to person (and possibly within a person), and the problem is to determine 
a set of numerical values for the things being compared that will in some sense 
best represent or average the judgments of the whole population. 


1 Adapted from Report D-3, ‘‘An approach for quantifying paired comparisons,”’ Re- 
search Branch, Information and Education Division, Headquarters Army Service Forces, 
Washington, D. ©., 1945. 


144 








0 
n 
e€ 
e 


8; 








Mv —_— we 


PAIRED COMPARISONS 145 





In some situations, the things being compared may be single items or objects; 
this we shall call the case of ordinary comparisons. In other situations, the 
things may be combinations of items or objects. 

This paper is devoted to the presentation of a general approach to quantifying 
comparisons or rank orders, with particular application to ordinary comparisons 
and to the comparison of combinations of two things. It seems to differ from 
previous approaches in at least two important respects: (a) it is based on but one 
simple principle, namely, that the quantification shall be the one best able to 
reproduce the judgment of each person in the population on each comparison; and, 
as a consequence, (b) the approach yields solutions not only to the traditional 
case of ordinary comparisons, but also to more complex cases that do not seem 
to have been discussed previously. 

An example of a major practical use of this approach is with respect to the 
demobilization score card of the United States Army. The problem was to 
determine the number of points to assign each of the variables on the score card 
according to the opinions of the soldiers themselves. The research on this was 
based on a form of paired comparisons more complicated than the ordinary one, 
and had additional complications of curvilinearities of various sorts in the data. 
Our approach handles such problems as well as the problem of ordinary com- 
parisons. 

Let us describe the score card problem in somewhat more detail. In a survey 
of enlisted men throughout the world by means of a questionnaire administered 
by field teams of the Research Branch, it was found that there were five variables 
that the men thought should receive consideration on the score card to determine 
order of demobilization: length of time in the Army, length of time overseas, 
amount of combat, age, and number of children. 

The problem now was to determine how much weight to give each of these 
variables in obtaining total scores. According to ordinary paired comparisons, 
one would ask, for example, ““Who should get out first after the war: a man 
who has two children or a man who has been in two battles?”’ But respondents 
refuse to judge such a comparison because the battle experience of the first man 
is not specified, nor is the number of progeny of the second man, so that there is 
insufficient basis for judgment. 

Therefore, in the actual research, judgments were asked on each of ten com- 
parisons put in the following form: 

“Here are three men of the same age, all overseas the same length of time. 
Check the one you would want to have let out first: 


—— A single man....through two campaigns of combat 
—— A married man with no children .... through one campaign of combat 
—— A married man with two children ....not in combat.” 


Each variable was compared with every other one in this fashion. 

The equations were derived for computing the relative number of points to 
assign to each month in the army, each month overseas, etc., which would be 
most consistent according to our principle. These are essentially the equations 
developed in section 6 of this paper. 
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The results showed strong curvilinearities in the men’s judgments. Amount 
of combat received one amount of emphasis when compared with age, and another 
amount of emphasis when compared with number of children. Since the score 
card would be too complicated in practice if curvilinear scoring were used, 
equations were derived for the linear scoring scheme that would be most con- 
sistent according to our principle. These are essentially the equations derived 
in section 7. The weights arising out of the research were computed from such 
equations. 

The variable age received a slight negative weight, which justified dropping 
it from the score card. The weights the Army finally adopted for the remaining 
factors were modified from the research weights, but yield essentially the same 
results as the research weights. Demobilization scores obtained from the one 
system of weights correlate very highly with scores obtained from the other. 

It can now be revealed that the Army’s modification was essentially to reverse 
the weights for children and battles. In subsequent attitude surveys on how 
well the soldiers liked the point system [8], a major complaint was found to be 
that battles got too little weight compared with babies! 


3. The basic principle. Our basic principle in deriving numerical values—let 
us call them ‘‘x-values’’—for the things being compared requires that the z- 


values of things a given person judges higher than other things should be as 
different as possible from the z-values of the things he judges to be lower than 
other things. This will be achieved if we make the z-values of things judged 
higher as homogeneous as possible among themselves, and the 2x-values of things 


judged lower as homogeneous as possible among themselves, for each individual. 
In the language of analysis of variance, our principle calls for minimizing the 
variation within individuals, compared with that within the group as a whole, 
The resulting z-values will tend to be the best for reproducing the judgment of 
each individual on each comparison with a minimum overall proportion of 
errors of reproduction [3, pp. 342-343]. The smaller this overall proportion of 
error, the better the quantification represents the data. Least squares is used 
for convenience for measuring variation in deriving the equations. 

The previous literature, on ordinary paired comparisons, seems to have 
concentrated largely on the problem of estimating the differences between means 
of hypothetical variables assumed to underlie the judgments. Thurstone has 
shown that by using assumptions of normality of distribution, equality of vari- 
ances, and zero correlations among hypothetical variables, it is possible to 
estimate relative distances between means for some kinds of data. 


2 This principle for quantification was suggested by previous work on scale analysis; 
see [3]. This theory has been developed further by the definition of a perfect scale in 
[4]. The equations for the perfect scale have interesting properties that may be related 
to paired comparisons; these equations are being prepared for publication. The referees 
have called my attention to related work on quantification by R. A. Fisher in [1, p. 283]. 

3 A good survey of the previous work, including that of Thurstone, is given in (2, pp. 
217-243]. For more recent work, see [7]. 
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The problem of estimating differences between means is not identical with 
that of reproducing individual judgments. For example, it can be shown, 
within the same framework of hypothetical variables conventionally used, that 
if variances are unequal and/or correlations are unequal then the means of the 
hypothetical variables are not in general the best quantification for reproducing 
individual judgments; the principal axis of certain product-moments of raw 
scores is the best quantification. It is in the special case where variances are 
equal, and where correlations are equal—not even necessarily equal to zero— 
that the principal axis 7s the set of means. Proof of this is given in the appendix. 

The approach of this paper does not use hypothetical variables, but inquires 
directly as to what numerical values can be derived from the observations that 
will best reproduce those observations. 

In the next section is treated the case of ordinary comparisons. The more 
complicated problem of the demobilization score card is formalized in section 5, 
and the equations for its unrestricted solution are derived in section 6. Since 
the unrestricted solution brings out curvilinearities that may be present, and 
since the score card in practice required a linear scoring scheme, equations for 
the most consistent linear quantification are derived in section 7. These are 
essentially the equations used in the research on the weights for the score card. 

The appendix shows a distinction between the conventional principle of 
estimating mean differences of hypothetical variables and the present principle 
of representing the comparisons of each individual. 


4. The case of ordinary comparisons. Paired comparisons as treated in the 
literature seem concerned largely with the ordinary case where separate things 
are compared, rather than where combinations of things are compared. Our 
principle covers the ordinary case as well as more complex cases, and we shall 
treat the ordinary case first since it involves less details. 

Let O,, O2,---, O, be the n things to be compared, where the assigning of 
subscripts is arbitrary. Each of N individuals is asked to make judgments of 
the form that O; is higher than (or lower than) O;. For convenience, we assume 
the rules of the experiment to exclude judgments of equality. We shall also 
assume that all people compare all the pairs. Hence, there are N sets of n(n — 
1)/2comparisons. Considering each comparison as comprising two judgments— 
one of “higher than” for one object and one of “lower than” for the other—there 
is a total of Nn(n — 1) judgments in the experiment. 

The judgments of all the individuals on all the comparisons can be represented 
compactly as follows. Let 


1 if individual 7 judges O; > O; 
(4.1) €i;x = 0 if individual 7 judges O; < O; 
Oj=k. 


The ranges of subscripts, whether free or dummy, will always be: 
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¢=1,2,---,N 
Zk= a «++ ,%, 


(4.2) 


so that the ranges will not be explicitly stated again. 
Definition (4.1) implies that if e;;, = 1, then ex; = 0, and that 
(4.3) Cin + Cu; = 1, (j ~ k). 
Let f:; be the number of things individual 7 judged to be lower than O; , and 
jet gi; be the number of things he judged to be higher than O;. Then 
(4.4) fii = x Csik » ji = X Cikj - 
From (4.3) and (4.4), we have 
(4.5) fit gizi=n— 1. 
Let F be the total number of comparisons made by each person; then 


(4.6) Pr _ n(n = 1)/2 = 2 te = > Jik - 

k k 
Let c be the number of times each O; was judged in the whole experiment, and 
let C be the total number of judgments in the experiment: 


(4.7) c=N(in-—1) =D) (fii + gis), C = Nn(n - 1). 


Both c and C count each comparison as two judgments, one of ‘‘lower than” 
and one of “higher than.” 

The means and variances to be considered are defined as follows. Let 2; 
be the numerical value to be derived for O; on the basis of the comparisons. 
Let ¢; be the mean of the 2-values of the things individual 7 ranked higher than 
the other things, weighted by the respective frequencies of the judgments, and 
let y; be the sum of squares of deviations from their mean of these x-values: 


: l : 
(4.8) {; = = >, te ties 
rs 


(4.9) Ue = Die — Wf = Lith fa — GP. 

; k 
Similarly, let wu; and z; be the mean and sum of squares respectively for the z- 
values of the things individual 7 ranked lower than other things: 


(4.10) u; = . >. Xk Jik - 
F‘*S 


(4.11) z= z (t, — Ui) gine = 7 Li ik — Ui F. 
k 


k 


Let V be the mean of all the z-values in the experiment, and let W be the sum 
of squares of deviation from their mean of the x-values: 


(4.12) Le eT ee 
Ct nk 
(4.13) W=>> (x, -V¥ c= ed x — V°C. 
k k 
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W is the total sum of squares for the experiment. Let R be the sum of squares 
between individuals, and let S be the sum of squares within individuals: , 


(414) R= Vl&—-VP+(w—- VF = FD (+) - V’C, 
(4.15) S=LV@Mtsa=W-R. 


Our principle is to quantify the judgments by obtaining the x-values that will 
minimize the variation within individuals compared to that’of the group as a whole. 
This means making S as small as possible compared with W, which is equivalent 
to making F# as large as possible compared with W. 

Therefore, if we define the correlation ratio E by 


(4.16) E’ = 1 — S/W, 


the problem is to determine the x; that will maximize E’. 
A convenient formula for E’ is, from (4.15) and (4.16), 


(4.17) E’ = R/W. 


Since E’ is invariant with respect to translations of the x-values, we can without 
loss of generality set 


(4.18) V=0. 
Then we can write from (4.14) and (4.13), respectively, 
(4.19) R = FY (t) + v3) 
(4.20) W=cd a. 
k 


To find the maximizing values 2; for E’, we differentiate the right member of 
(4.17) with respect to the x; , set the derivatives equal to zero, and obtain the 
stationary equations 


OR ow 
( — = Ff’ — 
(4.21) én, E de, ° 


The derivatives of R can be evaluated by differentiating the right member of 
(4.19) with the aid of (4.8): 


OR 2 
(4.22) ia" s x Lk X (fiiSix + gis gin). | 


From (4.20), the derivatives of W are 
Ww 
(4.23) ~ = 2ex;. 


If we let 


(4.24) Haw =D Safa + gis 9a) 
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then (4.21) can be re-written from (4.22), (4.23), and (4.24) as: 
(4.25) = Lk Aix = E vj. 
k 


Equations (4.25) are the equations to be solved numerically for the maximizing 
Lj. 

Before indicating a procedure for the numerical solution, let us first verify 
that a solution of (4.25) will satisfy (4.18). Summing both members of (4.25) 


over j, and using (4.24) and relations among the notation previously defined, 
we get : 


X ty = E” i Vj; 
or, from (4.12), 
(4.26) (—-F)V=0. 


Therefore, if E” + 1, we must have V = 0. Since a perfect correlation ratio 
will not in general occur in practice, condition (4.18) will in general be satisfied 
by a solution of (4.25). 

There is always a trivial solution of (4.25) for which E’ is formally equal to 
unity. Thisis 7; = 1. For this trivial solution, ; = ui = 1;R = W =C; 
E” = 1; and (4.25) is satisfied. Of course, E is not an actual correlation ratio 
for this trivial solution. 

The non-trivial solution of (4.25) can be carried out with the aid of matrix 
algebra. Let x be a row vector of the n elements z;, and let H be the n X n syn- 


metric matrix || H, ||. H is not only symmetric but Gramian, since its ele- 
ments are product sums. Now (4.25) becomes the matric equation 
(4.27) xH = E’x. 


Equation (4.27) shows that x is a latent vector of H, and E’ is a latent root to 
which this vector corresponds. Since we want the largest possible correlation 
ratio, we seek the largest of the non-trivial roots. If the two largest non-trivial 
roots are not equal, which should be the general case in practice, then there is a 
unique vector associated with the largest root which is the solution to our 
problem. . 

The numerical solution of (4.27) can be carried out by the simple iterative 
technique for latent roots and vectors (see, for example [6]). The iterations 
converge in general to the vector associated with the largest root. To avoid 
convergence tothe trivial solution (which formally has the largest root), the 
trial vectors should be adjusted to satisfy (4.18); then they will converge in 
general to the vector associated with the largest non-trivial root. 

A good way to choose a first trial vector is first to guess what the rank order of 
the x-values will be. Let 7; be the guessed rank of x; , the r; comprising the 
integers from one to n. If n is odd, then as the first trial x; use r; — (n + 1)/2. 
If n is even, then as the first trial 7; use 27; — n — 1. 
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A marginal check on the internal consistency of the judgments of the popula- 
tion is to compare each difference (1; — 2x,) with the corresponding difference 
( Ck — ie eix;). If the population’s judgments are sufficiently consistent, 


the signs of the two differences will be alike for all the comparisons. p Ciik 


is the frequency with which O; is judged greater then O;, and can be used as a 
basis for guessing the ranks of x; and x; . 





5. Comparing combinations of two things. The problem of the score card is 
but one example of a class of problems that can be formalized as follows. Con- 
sider a set of n items, where the jth item has m, categories. Let O;, be the pth 
category of the jth item, (p = 1, 2, --- ,m;;j7 = 1,2,---,n). TheO;, may be 
either qualitative or quantitative, and the order of subscripts assigned the 
categories can be arbitrary. 

Each of N individuals is asked to make judgments of the form that the com- 
bination (O;, , Ox-) is greater than (or less than) the combination (Oj, , Oxs). 
We shall assume that all people compare each of the pairs of combinations, and 
that the rules of the experiment exclude judgments of equality. 

The judgments of all the individuals on all the comparisons can be repre- 
sented compactly as follows. Let 














1 if individual 7 judges (O;p, Or) > (Oig , Ors) 
(5.1) Ciik/pras = . 
0 otherwise. 





Here and throughout this paper the ranges of subscripts, whether free or dummy, 
will always be as follows: 


i=1,2,++-,N 


(5.2) j,k =1,2,-++,n 






Pp, 49,7, 8 = 1,2,---,m;, (or m,, as the case may be), 





so that the ranges will not be explicitly stated again. 
Definition (5.1) implies the symmetry 


(5.3) 











Cijk/pras = Ciki/rp,sq » 
and that 


0 if individual 7 omits the comparison of (Oj, , 
Orr) with (Oiig 5) Ors) 

(5.4) Cijk/pr.gs + Ciik/qs,pr = " ‘ : . 

1 if he judges these two combinations to be 


unequal. 


152 LOUIS GUTTMAN 


Additional notation is defined as follows. Let a; jx;»- be the number of com- 
binations individual 7 judged to be lower than (Oj, , Ox-), and let 0; jx;p, be the 
number of combinations he judged to be higher than (O;, , Ox,): 


(5.5) Qijk/pr = Zz 7: Cijk/priqgs = ikj/rp 
a 8 


(5.6) biik/pr = a } Ci jk/qs.pr_ = Dinisrp - 
q 8 


Let. ¢ jx/pr be the number of comparisons for all individuals involving (Oj , O;,): 


(5.7) Cik/pr = be (A; jk/pr + Di jx/pr) = Ckj/rp - 
a 


Let fi, be the number of times that O;, occurred in combinations that were judged 
to be higher than other combinations by individual 7, and let g;;, be the number 
of times O;, occurred in combinations judged lower than others: 


(5.8) Siip = X p Bs Asjkipr = a Z Qikj/rp 5 
(5.9) Jiip = » ym Os ik/pr = » 7 Dikisrp - 


Let A;, be the total number of times in the entire experiment that Oj, was 
judged: 

(5.10) Ajp = ke (fin + giv) = » i Cik/pr 

Let F be the total number of comparisons made by each person, and let C be 
the total number of judgments in the entire experiment (a comparison com- 
prises two judgments, one of “higher than’”’ and one of “lower than”’): 


(5.11) F X LX fiir = X X Jiip » 
(5.12) C=) 2D Ai = 2NF. 


The means and variances required for the problem are defined as follows. 
Let x;, be the numerical value to be derived for O;, from the judgments. Let 
t; be the mean of the z-values of the combinations individual 7 judged to be 
higher than other combinations, weighted by the respective frequencies of such 
judgments, and let u; be the analogous mean of combinations judged lower than 
others: 


1 
(5.13) -, 7 7 Pe >» (Lip 1 Ler) Qijespr = : a A Ler Sizer 5 
F 7 k Pp ¥ F k r 
1 2 
(5.14) Ui apd z. 2 a (Lip + Ler) Oijxspr = oa zz Lkr Jikr - 
i k Pp r k r 


Let y; be the sum of squares of deviations from their mean of these “higher 
than” x-values, and let z; be the analogous sum of squares for the “lower than” 
x-values: 
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ers X d ps i (Lip + Ler — ti) Giikypr 
2 7 p r 
( 2 2 
= » ie » Zz (Lip + Ler) Gijkipe — UF, 
i k p r 
} Zz, Zz = >» (Lip + Ukr — Us) bixx pr 
; = FF Ff 
2 2 
DD Le in + tee)? Diikiyr — USF. 
i k p r 
Let V be the mean of all x-values, weighted by their respective frequencies 


in the entire experiment, and let W be the sum of squares of deviations from 
their mean of these 2x-values: 


— ‘5 2 
(5.17) V = — . 2 2 a (Lip + Ler )Cjkipr = FH x, » Lkr Abr , 
Cc j k Pp r Cc k r 
"= X X Zs Zz (Lip + Lier — VY cixjpr 
d . Pp r 
= DLL D (in + tir) cinrpr — VC. 
j k Pp r 
W is the total sum of squares for the experiment. Let R be the sum of squares 


between individuals for the experiment, and let S be the sum of squares within 
individuals: 


(5.15) 


(5.16) 


(5.18) 


619) R= Dl&G—-VP+u—-VIF=FLDG+ uv) - VC, 


(5.20) S=ViQyts)=W-R. 


Our principle for quantifying the judgments is to derive the x-values that will 
minimize the variation within individuals compared with that within the group 
asa whole. This means making S as small as possible compared with W. 

Therefore, if we define the correlation ratio E by 


(5.21) EP =1-—- S/W, 


our problem is to determine the xj, that will maximize E’. 
A convenient formula for E” is, from (5.20) and (5.21), 


(5.22) E’ = R/W. 


° Oe ° ° ° 
Since E* is invariant with respect to translations of the x-values, we can 
without loss of generality set 


(5.23) V =0. 
Then we can write, from (5.19) and (5.18) respectively, 


(5.24) R=FD (+ ui) 


(5.25) Ww=)d 2 Do De (tip + tar)’ Cpaize « 
d . Pp r 
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6. The unrestricted maximum. To find the maximizing x-values for E?, 
we differentiate the right member of (5.22) with respect to the z;, and set the 
derivatives equal to zero. This yields the stationary equations 
oR » OW 


6.1 —" ‘ 
aa OX jp OL ip 





= 


To evaluate the partial derivatives of R, we differentiate the right member of 
(5.24), using (5.13) and (5.14), and obtain 


OR 8 


(6.2) ce ae 7 pH Ukr 2 (SiinS iter + Jiip Gikr)- 
Ox ip F k r i 
Similarly for W, we differentiate the right member of (5.25) and obtain 
ow 
(6.3) ae... = A(x jp A jp + » 2, Pur Cinj pe). 
27P . r 

From (6.2) and (6.3), (6.1) can be written as 
(6.4) dX Do Vir hjkipr = FE" (zip Ain + X Do Lie Cieipr) » 
where 

. 1 
(6.5) Ninror = 72s (SsinSinr + Gein gitr)- 


The numerical solution of the z-values is to be obtained from (6.4). 

Before showing a procedure for the numerical solution, let us verify that a 
solution of (6.4) will also satisfy (5.23). Summing both members of (6.4) over 
j and p, and using (6.5) and relations among the notation laid down in the pre- 
vious section, we get 


a Zz, Lkr A a 1E( do Zz. Lip A ip + dX YD Lkr Agr) 
r 7 Pp r 
or 


(6.6) (i — E’)> >> terAnr = 0. 


From (5.17), this can be written as 
(6.7) (l—-EF)V=0. 


Therefore, if EF” ¥ 1, we must have V = 0. Hence, any solution of (6.4) which 
does not yield a perfect correlation ratio must have a weighted mean of zero for 
the z-values. Since a perfect correlation ratio will not in general occur in 
practice, condition (5.23) will in general be satisfied and is no restriction. 

It should be noted that there is always a trivial solution for which E’ is for- 
mally equal to unity. The trivial solution is to set xj, = 1. Then t; = u; = 2; 
R = W = 4C; E’ = 1; and (6.4) is satisfied since it reduces to (6.7). For this 
trivial solution, E is of course not an actual correlation ratio. 
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The non-trivial numerical solution of (6.4) can be carried out in practice with 
the aid of matrix algebra. Instead of regarding the x;, as elements of a table 
with n rows with m; elements in the jth row, consider the rows of such a table 
placed end to end to form a single row of M = } is m; elements. Denote this 


2 
as the row vector x. Correspondingly, consider the values h jx;p, arranged to 
form the elements of a symmetric matrix H of M rows and columns; consider 
the M values A ;, to be the diagonal elements of an M X M diagonal matrix A; 
and consider the values of cj», arranged to form an M X M symmetric matrix C. 
Let \ = 3E°. Then (6.4) becomes in matric form: 


(6.8) xH = \(xA + xC) = Ax(A + C). 


In the next paragraph it is shown that, in general, (A + C) is non-singular, 
so that it has an inverse by which the members of (6.8) can be postmultiplied, 
yielding 


(6.9) xH(A + C)* = xx. 


This shows that x is a latent vector of H(A + C)', and ) is the latent root to 
which this vector corresponds. Since we want the largest possible correlation 
ratio, we seek the largest of the non-trivial latent roots. If the two largest non- 
trivial roots are not equal, which should ordinarily be the case in practice, then 
there will be a unique latent vector associated with the largest root. 

It is of interest to show that all the latent roots of H(A + C)~ are real and 
non-negative, and that all the latent vectors are real. First, we notice that H 
is Gramian, for its elements are product sums. To see that A + C is Gramian, 
we notice that from (5.18) and (5.10), 


6.10) W=2)) Visi Aw+2D LL Litivtecinier — V°C, 
2 Pp 7 Pp Tr 
or, in matric notation, and transposing members, 
(6.11) 2x(A+ C)xX’ = W4+ VC. 
Since W is a sum of squares, the right member is clearly non-negative ; and hence 


(6.12) x(A + C)x’ = 0, 


for all x. Thus, A + C is nonnegative-definite, or Gramian. Furthermore, 
A+ C is in general nonsingular, because according to (5.17) and (5.18), V and 
W cannot vanish simultaneously unless 


(6.13) , (Xip + 8i,)e ik /pr = 0. 


Ifn = 3, then (6.13) will ordinarily imply that x;, = 0, that is, the equality in 
(6.12) will hold if and only if x = 0. In sucha case, A + C is positive-definite, 
or is nonsingular as well as Gramian, and possesses an inverse. 

As is well known, the inverse of a Gramian matrix is Gramian (see [5, p. 71], 
for example), so that (A + C)~ is Gramian. That the latent roots of H(A + 
C)“ are all nonnegative follows from a general theorem that the latent roots of 
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the product of two Gramian matrices are always nonnegative [5, p. 116]. The 
proof of this is brief, and will be repeated here in a little different variation in 
order to prove in addition that the latent vectors are all real. Let G be a sym- 
metric square root of A + C, so that G? = A+ C. If we postmultiply both 
members of (6.9) by G, we can write the results as: 


(6.14) (xG)(G"HG") = X(xG). 


This shows that xG is a latent vector of G'HG' corresponding to the root X, 
But GHG ‘is symmetric, and in fact Gramian, for it can be written in the form 
(G'K)(G'K)’, where KK’ = H. Hence, each \ is nonnegative, and each 
xG is real, whence each x is real. 

The numerical solution of (6.9) can be carried out by the simple iterative 
technique for latent roots and vectors (see, for example, [6]). The iterations 
converge in general to the vector associated with the largest root. To avoid 
convergence to the trivial solution (which formally has the largest root), the 
trial vectors should be adjusted to satisfy (5.23); then they will in general 
converge to the vector associated with the largest non-trivial root. 

A marginal indication of the internal consistency of the judgments is the 
agreement in sign of 


(Lip + Ler) — (Lig + Vee) 
with 


a Cijk/prias — z Cijk/qs,pr 5 
a i 


for each of the comparisons. If one combination is judged higher by more 
people in comparison with another, then its z-values should exceed those of the 
other for marginal consistency. 


7. The maximum under certain linear restrictions. In the previous section, 
no restrictions were placed on the x;, in maximizing E”. For some problems, 
the O;, may be quantitative, and it may be desired within each item to keep the 
distances between the x;, proportionate to the distances between the O;,. This 
was the case for the score card, where a linear system of weighting had to be 
used to be practicable for the army. It was necessary to derive a constant 
multiplier for length of service, a constant multipler for time overseas, etc., 
even though there were curvilinearities in the judgments. 

Our principle enables us to handle sucl# restrictions just as well as the un- 
restricted case. We shall derive the set of multipliers which is most consistent 
for the judgments in the sense of least squares. The ordering of categories 
within an item will no longer be considered arbitrary. Instead, subscripts will 
be assigned in a fashion to make (O;, — Oj.) proportional to (p — q) within 
each item. For convenience, the subscripts can be assigned beginning from zero 
for each item. 
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The linear restriction is to determine z-values in the form 
(7.1) Lip = &; + 0, 


where the £; and the 7; are now the basic unknowns to be solved for to maximize 
E. It isthe 7; that are of interest, for they will be the multipliers; but the é; 
have to be used in the analysis to help determine the multipliers even though 
they are only additive constants that will not affect the order of total scores of 
people. 

To maximize E” under the linear restrictions, we differentiate the right mem- 
ber of (5.22) with respect to the £; and the 7; , set the derivatives equal to zero, 
and obtain the stationary equations 
OR _ ,dIW 
0&; 0§; 
te) ae. 

On; On; 


In order to evaluate the indicated derivatives, it is helpful to introduce some 
more notations. Let: 


(7.4) lock = Di furry — Moin = De Gime 
La = x Th sxe m,n = dX TQ kr 
daik = X dX PD” Cikipr 
duit = 2X x PICik/pr = Aki 


Da,j a X 7 z. p° Cik/pr = X da, jx 
por : 


(7.2) 


1 
(7.9) hoz = >» (lo.ijlocx ++ Mo,i; Mo, ix) 


1 
(7.10) hi. = ae (hijij boiz + 1,4; Mo, sx) 


" 1 
(7.11) hex = >> (lisgliin 4 mi,s7 M1,:%)- 


It is important to notice that do, j, = dox;, but that dij 4 dix;, Similarly, 
hoik = hon; and he, jx = hex, but hi,ix A hin. 

To evaluate the derivatives of R, it is helpful to re-write the right members of 
(5.13) and (5.14) by means of (7.1), (7.4), and (7.5): 


(7.12) i= > (Ei loiz + elise) 


(7.13) u;= -L (Ej Mok + M1, ik). 
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Differentiating the right member of (5.24) with respect to the &; and the 7; re- 
spectively with the aid of (7.12) and (7.13), and using (7.9), (7.10), and (7.11), 
yields 

oR 


(7.14) 5g, — 8d (Eehoie + me hai) 
gj k 


OR 
(7.15) ~~ sd (& hain + me he, jx). 
; . 


For the derivatives of W, we re-write (5.25) using (7.1): 


(7.16) w=) a Do De (Ei + png + & + re)? Cjx/pr - 


Differentiating with respect to the £; and 7; respectively, we obtain, using (7.6), 
(7.7), and (7.8), 
— ow 
(7.17) gg, = ME: Dos + 0) Dai + Le Ge dose + om das) 
j : 
ow 


(7.18) On, 


= 4[£; Di; + 7; Do; + 2 (€, diix + me dir, jx)] « 
The stationary equations (7.2) and (7.3) can now be re-written by means of 
(7.14), (7.15), (7.17), and (7.18) as: 


(7.19) X (£4 hose + ome haes) = FETE; Doi + 0; Di.3 + X (&% do,ix + nx di,x,)} 


(7.20) >» (& ha, jx + Nk he, jr) = LEE; Dy, ; + nj Ds; + X (Ex dh, jx + Nk d11, ;)). 
k k 

These are the equations to be solved numerically for the maximizing £; and 7;. 

Before showing a procedure for the numerical solution, let us verify that a 

solution of (7.19) and (7.20) will satisfy (5.23). From (7.1), (5.17), and (7.8), 


. >» (& Don + ne D1,x) « 


C k 


(7.21) V 


Summing both members of (7.19) over j shows that 
(1 — E’) x (& Dox + ne Dix) = 0, 


or, from (7.21), 
(1 — E*)V = 0. 


Hence, if E” ¥ 1, the corresponding solution will satisfy the condition that V = 0. 

As in the unrestricted case, there is always a trivial solution that will yield an 
E’ formally equal to unity. This trivial solution is £; = 1, ; = 0, which makes 
Xip = 1 as in the previous case. These values satisfy (7.19) and (7.20), and 
have E* = 1. Of course, E is again not an actual correlation ratio for this trivial 
solution. 
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To obtain a non-trivial solution, it is convenient to write (7.19) and (7.20) in 
matric notation. Let 


(7.22) z= ||) [nd |l. 
zis a row vector of 2n elements, the first n elements being the £; and the last n 


elements being the 7;. Let 


| ro, sel Ua, and | 
(7.23) h = | 


|| [Aes] (he, ix 


his 2n X 2n and is symmetric; in fact it is also Gramian, since its elements are 
product sums. Let 6; be Kronecker’s delta, and let 

] [Do,5 jx + do, jx [Dy 555% + die] | 

(7.24) c= ] \ 
| (Di,5 Sie + dies) [Do, 555% + di, jx) | 


c also is 2n X 2n, symmetric, and Gramian. Again let 


(7.25) \ = iB’. 
Equations (7.19) and (7.20) can now be stated as a single matric equation: 
(7.26) zh = Xzc. 


In general, c will be nonsingular, so that it will have an inverse by which both 
members of (7.26) can be postmultiplied to yield 


(7.27) zhc = dz. 


Therefore z is a latent vector of hc’, and \ isa latent root. Since we want the 


largest correlation ratio, we seek the largest of the non-trivial latent roots. 
The largest root in practice will ordinarily be unique. There is then a unique 
latent vector corresponding to this root, and the elements of this vector provide 
the most consistent £; and 7; for the population in the sense of least squares. 

That ¢ is Gramian and in general nonsingular, that the latent roots of Ae 
are all nonnegative, and that the latent vectors of hc are all real, requires only 
proofs analogous to those for the corresponding properties of A + Cand h(A + 
C)* in the previous section, which need not be repeated here. 

As in the previous section, the final numerical steps can be carried out by 
iterations according to (7.27). Again, the trial vectors should be adjusted to 
conform to (5.23) to prevent convergence to the trivial solution. 

A marginal indication of the consistency of the quantification is the agreement 
in sign of 


(p — qg)ni + (r — 8)™ 


a Cijk/prigs — } Cijk/qs,pr » 
‘ L 


with 


for all comparisons. 
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Appendix: A distinction between the conventional principle and the present 
principle. The relationship between the conventional principle of estimating 
means of hypothetical distributions and the present principle of reproducing 
the comparisons of each individual will be analyzed here for the case of ordi- 
nary comparisons. Only the principles will be contrasted here. 

In the conventional approach, it is assumed that each of the N individuals 
has a numerical value for each of the O;. Let s;; be such a value of O; for the 
ith individual. The hypothesis is that person 7 makes the judgment O; > 0, if 
Si; > 8 ; and the conventional problem is to estimate from the judgments what 
the relative distances are between the means yp; , where 


1 
(A.1) i N X Sj - 


The ranges of the subscripts are:7 = 1, 2, --- ,N; 9, k,l = 1,2, --- ,; and will 
not be explicitly indicated. 

According to the approach of this paper, if we are to consider hypothetical 
variables, the problem would be to determine for each O; a numerical value z; 
such that the differences (x; — 2,) will best approximate the (s;; — si) for each 
individual in the sense of least squares. This will separate “higher than” z- 
values from “lower than” x-values. If we let 


(A.2) Z=DVD a [(Sei3 — Sie) — we (x; — wel, 


where w; is a constant of proportionality to be determined for each individual 
separately, then the problem is to determine the x; and the w; which will mini- 
mize Z. 

Differentiating Z with respect to the w; and 2; respectively, and setting the 
derivatives equal to zero, yields the stationary equations 


(A.3) Dd wil(ss; — &) — we (x; — #)] = 0 


(A.4) dX (tz — E)(sy — were) = 0, 
where 
(A.5) 


Since Z is invariant with respect to translations of the x; (also to translations 
of the s;;), the origin of the x; is arbitrary, and there is no loss in generality in 
setting 


(A.6) 
Then if we let 
(A.7) 









i- 


ls 
he 


at 





(A.8) 


(A.9) 


equations (A.3) and (A.4) can be re-written respectively as 
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a w; (83 — §;) = ax;, 
+ 


X Lk Sse = BUe. 


By summing both members of (A.8) over j, we see that 


(A.10) 


Therefore, since in general a > 0, we must have = = 0; and a solution of (A.8) 


ad) tj = 0. 
i 


will necessarily be consistent with (A.6). 


Using (A.9) in (A.8) yields the stationary equations for the 2; alone: 


(A.11) 


This shows that the x; are elements of a latent vector corresponding to a latent 


X Lk he Siz(Sgj — 3;) = aB2;. 
C uv 


root a8 of the n X n matrix defined by the elements S,, , where 


(A.12) 


To determine which one of the latent roots provides the minimum Z, we first 
notice—by multiplying both members of (A.9) by w; , summing over 7, and using 


(A.7)—that 
(A.13) 


Then expanding the right member of (A.2) with the aid of (A.9) and (A.13), we 


obtain 


(A.14) 


Clearly, Z will be minimized if we use the largest a8. Therefore, we seek the 


Sik = 


1 
i Sex(8ij — 3%) = a 85; Sik — . x pS Sik Sit. 
t v + 


ia Du Te Sox We = af. 
"4 


Z/2n = DY Dilsii — 3)’ — a8. 


‘latent vector associated with the largest latent root of || Sx ||. 


To examine the relation of the elements of this minimizing latent vector to the 
means yu; of the hypothetical variables, denote the variances and correlations 


of the hypothetical variables by: 


(A.15) 


(A.16) 


Then 
(A.17) 


1 


1 
oj = a+ (845 — ws) = We sii — 1; 


Pik = 


, 1 
z (85; — 43) (Sie — me) N Zs 8ij Sik — Mi bk 


No; o% 0; Ok 


ie 8575 Six => N(o; 94 pis + Hi Mk) . 
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From (A.17) and the last member of (A.12), we can write 


1 1 
(A.18) — Six = 0; 04 pik + upper — = » (o%01 pki + uk Mr). 


N 

The elements of the matrix of which the x; are a latent vector are now ex- 
pressed in terms of the means, variances, and correlations of the hypothetical 
variables, according to the right member of (A.18). It is clear that in general, 
the u; are not elements of a latent vector of || S;x ||, so that our approach is in 
general not equivalent to the conventional approach. i 

In the special case of equal variances and correlations, such as is often as- 
sumed in the conventional approach,’ we can now see that the u; do define a 
latent vector. For this case, let the common variance be o’, and let the common 
correlation coefficient be p. Then 


(A.19) pix = p+ 5x(1 — p), 
where 6,, is Kronecker’s delta; and (A.18) becomes 


1 2 1 
(A.20) N Si.= ¢(i — P(6 = ‘) + (uj = Bur, 
where 


(A.21) i = 


Sle 


Ba. 
From (A.20) and (A.12), (A.11) becomes converted to 
(A.22) ly — (1 — p)l aj = (ui — a) du Mk Lk y 


where 








(A.23) y = aB/N. 
Multiplying both members of (A.22) by x; and summing over 7 shows that 
(A.24) (Do uiai) = Bly — o°(1 — p)l. 


From (A.22) and (A.24) we obtain the elements of the minimizing latent vector 
for Z to be, in normalized form, 













A.25) © tet ng ee, 
a Vi’ Ve wo a 


That this is the minimizing vector follows from the fact that the remaining 
latent roots must all have y = o (1 — p) in order to have vectors distinct from 
(A.25); (A.25) does correspond to the largest nontrivial root, since for it the 


4 More specifically, zero correlations are assumed, but this is not necessary for our 
purpose. 















at 
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root satisfies the inequality y > o(1 — p). (The remaining latent vectors are 
not uniquely defined, for they all correspond to equal roots.) Therefore, the 
means of the hypothetical variables are a linear function of the elements of the 
minimizing latent vector for the case of equal variances and correlations. 

As a final comment, it should be pointed out that paired comparisons are 
insufficient to estimate the hypothetical values. Two persons with widely 
different hypothetical values will make the same judgments provided only that 
their values have the same rank order. Therefore, hypotheses about variables 
presumed to underlie the comparisons cannot be completely tested only on the 
basis of the comparisons. 

Psychologically, it may or may not be proper to assume that judgments of the 
type O; > O; can be expressed as a function of differences s;; — sx. Perhaps, 
psychologically, comparisons may operate on some more complicated principle. 
The approach presented in the body of this paper does not assume anything 
about underlying variables, but simply seeks a set of numerical values that will 
best help reproduce the observed data for each individual. 
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RELATIVE ACCURACY OF SYSTEMATIC AND STRATIFIED RANDOM 
SAMPLES FOR A CERTAIN CLASS OF POPULATIONS' 


By W. G. CocHran 


Iowa State College 


1. Summary. A type of population frequently encountered in extensive 
samplings is one in which the variance within a group of elements increases 
steadily as the size of the group increases. This class of populations may be 
represented by a model in which the elements are serially correlated, the correla- 
tion between two elements being a positive and monotone decreasing function 
of the distance apart of the elements. For populations of this type, the relative 
efficiencies are compared for a systematic sample of every kth element, a stratified 
random sample with one element per stratum and a random sample. 

The stratified random sample is always at least as accurate on the average 
as the random sample and its relative efficiency is a monotone increasing function 
of the size of the sample. No general result is valid for the relative efficiency of 
the systematic sample. In fact, there are populations in the class in which the 
systematic sample is more accurate than the stratified sample for one sampling 
rate, but is less accurate than the random sample for another sampling rate. 
If, however, the correlogram is in addition concave upwards, the systematic 
sample is on the average more accurate than the stratified sample for any size 
of sample. 


Some numerical results are given for the cases in which the correlogram is (i) 
linear (ii) exponential. 


2. Introduction. We consider a finite population consisting of the elements 
1,22, °** » Xnk, Where n and k are integers. A systematic sample is drawn by 
choosing an element at random from the elements 7 , --- , x, , and then selecting 
every kth consecutive element. That is, if x; is the element first chosen, the 
systematic sample comprises the elements 2; , Ti+z,°** » Tit4¢n—ye- This type 
of sample has found considerable use in practice, because it is often easier to 
select and to administer than a random or stratified random sample and because 
it has an intuitive appeal through spreading the sample evenly over the popula- 
tion. Much remains to be learned, however, about the accuracy of this system- 
atic sample relative to that of comparable random or restricted random samples. 
Probably the most relevant comparison is that between the systematic sample 
and the stratified random sample having one element per stratum. In the latter 
case, the population is divided into the n strata {11,--:, 2k}, {Tea 77's 
tor}, --* , and one element is chosen independently at random from each of the 
strata. This type of sample is similar in many respects to the systematic 


1 Journal paper No. J-1341 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project 891. 
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sample. Both divide the population into the same n strata of k elements each, 
with one element chosen from each stratum. Moreover, neither sample provides 
the data for an unbiased estimate of the sampling variance of the sample mean, 
at least in the sense that the estimate is unbiased whatever the form of the 
population of elements 2; . 

The first thorough investigation of the properties of systematic samples was 
made by W. G. and L. H. Madow [1]. In particular, these authors compared 
the accuracies of a systematic sample and a stratified random sample of the types 
described above for several types of finite population. Where the elements in 
the population lie on the line 2; = 7, they showed that the stratified random 
sample, with one element per stratum, is more accurate than the systematic 
sample. If the population has a periodic distribution, the stratified random 
sample is superior when k is an integral multiple of the period, but the system- 
atic sample is superior when k is an odd multiple of the half-period. The authors 
also considered the more complex case where the population contains both a trend 
function and a periodic function. 

The object of this paper is to make similar comparisons for another type of 
population which appears to be fairly frequently encountered in extensive 
samplings. The population is one in which the variance among the elements in 
any group of contiguous elements increases steadily as the size of the group 
increases. This type of population has long been regarded as applicable in field 
experimental work, where the variance among plots within a block is found 
usually to increase with the size of block. Summarizing data from 40 uniformity 
trials, Fairfield Smith [2] verifted this notion and derived an empirical relation- 
ship from which the rate of increase may be estimated. The same type of popu- 
lation is also considered in several recent papers on extensive sample surveys. 
Thus, in a discussion of methods for sampling farm populations, Jessen [3] 
postulated a law in which the variance among farms within a grid is a monotone 
increasing function of the size of the grid and used the law for estimating the 
optimum number of farms which should be included in a sampling-unit. 
Mahalanobis [4] independently developed the same law as Fairfield Smith in a 
comprehensive investigation of large-scale sample surveys. Hansen and Hurwitz 
[5] referred to the increase in variance within a cluster with growing size of cluster 


as typical of many actual populations. Numerous other references could be 
given. 


3. Specification of the population. Various mathematical models may be 
constructed to represent the situation in which the variance within any group 
increases with increasing size of group. For instance, we might consider that 
the elements x; are drawn from different populations, the population changing in 
some regular manner with 7. Alternatively, the x; may be assumed to belong 
to the same population, but to be serially correlated. For simplicity, we assume 
further that the serial correlation between 2; and 2;+,, is some quantity p, which 
depends only on u. Then if p, is positive and is a monotone decreasing function 
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of u, it may be expected from intuition (and will be proved later) that the 
variance within the group of elements 2; , ii: , «++ , 2:44 1S a monotone increasing 
function of k. This model seems appropriate for our purpose, since many writers 
refer explicitly to positive correlations between the x’s as the basis for the 
phenomenon of increasing variance. 

The specification above will be qualified in one respect. To assume that the 
p’s are strictly monotone for an actual finite population of only moderate size 
does not seem realistic. While the correlogram may exhibit a definite downward 
trend, yet individual fluctuations about the trend prevent the correlogram from 
being strictly monotone. It is more reasonable to regard the finite population 
as being itself a sample from an infinite population in which the p’s are monotone. 
This attitude is, I believe, in accord with that of the authors referred to above, 
who, as I interpret their writings, regard the variance law as holding in an ideal- 
ized population. Thus, comparisons between the systematic and stratified ran- 
dom samples will be made not for a single finite population, but for the average of 
finite populations drawn from an infinite population with monotone decreasing p. 
Results for an individual finite population will differ from the average results 
because the r’s which appear in the population fluctuate about their expectations 
p. As the finite population becomes larger, its results will tend to coincide with 
the average results. 

Accordingly, the elements x; ,7 = 1, 2, --- , nk, are assumed to be drawn from 
a population in which 


E(x;) = pw, E(x; — yp)? = o', E(x; — B)(titu — Bw) = Puc 


where p,, > p, > 0, whenever u < v. 


4. Some useful preliminary formulas. If ¢ is the mean of a specified finite 
population, the following algebraic identity, frequently useful in the analysis of 
variance, is easily established. 

kn 


kn y 
(1) (kn) > (2; - ® = DX Dd (a; — 2)’. 


t=1 i=l j>i 
Since there are (kn)(kn — 1)/2 possible pairs of values (x; , 2;), this gives 


kn — 1 2 n— 1 
= MED Bea, — 2) = MSV Ete —w) - Gs wl 
where E is taken over the finite population. Now expand the quadratic and 
average over all finite populations. In the (kn)(kn — 1)/2 combinations, there 
are (kn — 1) in which j exceeds 7 by 1, (kn — 2) in which j exceeds 7 by 2, and 
so on. Hence 
kn . ( 2 kn—1 1 
é a; — #) = (kn—-—1)0 = 2 kn — a's 
(3) ED (x; — #)° = (kn — lo ! (my en =D 2 (kn — U) pag 
To obtain the corresponding expectation for the sum of squares within a single 
stratum of k consecutive elements, we need only replace (kn) by k in (3). Since 
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the result is the same for all n strata, we obtain 












; k—1 
(4) £ (8.8. within strata) = n(k — lo ! ee dX (k — hak, 
kk-D f 

Formula (3) also gives the expected sum of squares within a specified system- 
atic sample if we replace (kn) by n and u by (kw), since there are n elements in 
the sample and since the correlations between successive elements are px , pr , * 
instead of pi, p2,-::. The result is the same for each of the k systematic 
samples. Hence 





(5) E(S.S. within systematicsamples) = k(n — 1) 0° . _ : 


1] n(n — 1) 


















n—l 


2X (n — u) pret 






5. Average variance fora random sample. The symbols o>, 02, , o:y will be 
used to denote the average variances of the means of the random, stratified ran- 
dom and systematic samples, respectively, about the mean of the finite popula- 
tion, this average being taken over all finite populations drawn from the infinite 
population specified in the previous section. Comparisons with the random 
sample, though not our main purpose, will be included where they are of interest. 

For a single finite population, it has been shown by several writers that the 
variance of the mean of a random sample is 





1 (kn-—n) 12 
(6) 5 ent me 


where ¥ is the mean of the finite population. 
From (3), we obtain 


‘ 2 ¢ _ 1 j _ 2 kn-1 _ 1 
(7) %. = . ¢ t) ? (kn) (kn — 1) a (kn u) Pur 


6. Average variance for a stratified random sample. If Z,; is the mean of a 
typical stratified random sample, the sampling veriance of Z,; is by definition 


(8) E( — 2)’. 














Consider first the average over a single finite population. Let 7, Z2, +--+, En, 
be the means of the n strata, respectively, and let 21; , 2;, +--+ , Yn; be the ele- 
ments selected from the respective strata. Then (8) may be written 













(9) SE (ay — 4%) + (X93 — Ze) + oo + (nj = Z,)}? 
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>. ri; = nix and > # = ne. 
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Take the average over all k” samples from the finite population. All cross. 
product terms vanish, since, for example, 1; appears equally often with x», , Loo , 
-,%%. This gives 
n k 
(10) — >) (riz — &)° 
kn? fai jal 


for the variance for a single finite population. The sum of squares involved is, 
of course, simply the sum of squares within strata. Hence, by (4) 


2 k—1 
(11) i, “(1 n :) : 
n k 


\ 
s1— : = u(e 
l k(k — 1) 2, " u)p f 

7. Average variance for the systematic sample. If z., is the mean of a typical 
sample, the variance for a single finite population is 


\2 


(12) E (Ley — z) 


where the sum is taken over the k systematic samples. Since the sum of squares 
among samples is equal to the total sum of squares in the population minus the 
sum of squares within samples, (12) equals 


kn 
(13) = z. (x; — 2 — : (S. S. within systematic samples). 
kn 4=1 kn 
To obtain the average over all finite populations we substitute from (3) and 
(5) for the first and second terms respectively. The result is 
- (kn a ’ " j 2 kn-1 
fa 2 r 1 = 
Oe Ms C ~e (kn) (kn — 1) #4 


_m@-13f, 2 Fy_ \ 
nn - \ n(n — 1) 2 - w) Peg. 


(kn — u) pt 


This reduces to 


(15) o. o : Jy — pines > (kn — u)p 
” n k} \ kn(k — 1) wi 7 


“ ok n—1 t 
——__—— N — U) peu 
tren ae 
It should be noted that the formulas and notations above are different from 
those used by the Madows, who define p and o” with reference to a single finite 
population and discuss the sample variances for a single finite population. 


8. Relative accuracies of random and stratified random samples. First, some 
general comments. From (7), (11) and (15) the relative efficiencies of the three 
types of sample are seen to depend only on the linear functions of the p’s which 
appear in o; , ¢.,,ando.,. It is easy to verify that in each case the sum of the 
coefficients of the p’s is unity. For the random sample, the linear function in- 
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volves every serial correlation up to lag (kn — 1) with coefficients which decrease 
linearly as the lag increases and are independent of the size of sample, depending 
only on N = (kn), the number of elements in the finite population. For the 
stratified random sample, only serial correlations with lags up to (k — 1) appear, 
k being the number of elements in the stratum. As presented in (15), the 
formula for the systematic sample is separated into two linear functions. The 
first is the same function as appears in the formula for the random sample except 
that all coefficients are (kn — 1)/(k — 1) times as large. The second, which 
carries a positive sign, involves correlations where the lag is a multiple of k. 

Thus far the formulae require no restrictions on the p’s. In considering the 
case where the p’s are positive and monotone decreasing, the following lemma is 
helpful. 

LemMa. If p;, (¢ = 1, --- , m), are positive and monotone decreasing, that is, 
pi > piri > O and tf (a; + a + +--+ + am) ts zero, the necessary and sufficient 
conditions that 


(16) L = api + aope + +++ + mpm > O, for all admissible sets of p’s, 
(17) area +at-+-+a;>0,7 = 1,2,---,(m— 1). 


For let p; = pis1 + 6;, where by hypothesis 6; > 0. Then if we substitute 
successively for pi, p2, °** , Pm—1 in terms of 6; , bo, --+ , dma, we find 


(18) L = adi + (ar + a)b2 + (a1 + a + a3)53 + --- 
+ (ar + ae +e + am1)bm-1, 


the final term in p», vanishing because (a, + --- + am) is zero. Since all 6; > 0, 
the sufficiency of (17) isobvious. Also, if for any 7 the coefficient of 6; is negative, 
we can make L negative by choosing that 6; as positive and all other 6’s as zero. 
This establishes necessity. 

CoroLuary. If p; are strongly monotone, i.e., pi > pis1, and if at least one of 
the a; is different from zero, conditions (17) are sufficient to establish that L exceeds 
zero. For in (18) all the 4’s are greater than zero and by (17) none of the 4’s has 
a negative coefficient. Further, the coefficient of at least one of the 6’s must 
exceed zero, otherwise all the a’s would be zero. Hence L > 0. 

We now show that if the p, are monotone decreasing, 


> k—1 
(19) L(k) = ik — 1) Do (k — U)pu 


u=l1 








isa monotone decreasing function of k. This is the linear function which appears 
in the variance of the stratified sample. 
= k 


k—1 > 
(20) L(k) — L(k+ 1) = ke - py — U)pu — + pk *+ 1 — u)pu 


2 k 
(21) = ke — 1 2» (k + 1— 2u)p.. 
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Since the sums of the coefficients of the p. are unity in L(k) and L(k + 1), 
the sum is zero in (21). Hence the lemma may be applied. But it is obvious 
that the sum of the first 7 coefficients in (21) exceeds zero, since the coefficients 
are all positive for u < (k + 1)/2 and all negative for u > (k + 1)/2. Hence 


(22) L(k) — L(k+ 1) > 0. 


Further, by the corollary, if the p, are strongly monotone, L(k) is strongly mono- 
tone. Since all p, are positive, this result is sufficient to prove that 


2 k—1 nk—1 

(23) 1- kk —D 2d (k — up, S 1 eens > (nk — w)gu . 
Consequently, for any size of sample the average variance of the stratified sample 
cannot exceed that of the random sample. Further, the relative efficiency of the 
stratified sample to the random sample is monotone increasing with decreasing 
size of stratum, i.e. with increasing size of sample. There is, of course, nothing 
unexpected in these results. Equation (22) also establishes the result mentioned 
in the third section, that with monotone decreasing p, the average variance with- 
in strata increases steadily as the size of stratum increases. For if n(k — 1) de- 
grees of freedom are assigned to the sum of squares within strata, formula (4) 
above shows that the average variance within strata is 


of, _. 2 \ 
(24) oi ig y 2 k — u)pup = o {1 — Lik}. 


9. Comparison of the systematic and random samples. Upon investigation, 
it is soon evident that no general results can be established about the efficiency 
of the systematic sample relative to the random samples, unless further restric- 
tions are made on the form of the population. In order to apply the lemma, we 
find the sums of the first 7 coefficients of the linear functions of p which appear 
in the variance formulae (7), (11) and (15). By elementary methods these sums 
are found to be 


_ i(Qnk — i — 1) 
de = nk(nk — 1) 


a(2k — + — 1) 
BE care ennecnmrgoens <i< 
Lat k(k aa 1) ’ | @sS 


1 





> _ t(2nk — i — 1) _ rk(2n — r — 1) 
P nk(k — 1) nk—1) ° 


where r is the integer such that (r + 1)k > 72> rk. 
From the lemma, in order to establish 0%, < o%;, it would be necessary to show 
that =., > 2. for any 7. Now if 7 is less than k, so that r is zero, clearly 
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(26) > >? an eos oe 


except when n is 1, in which case all three are equal. 
But if 7 is an integral multiple of k, say rk, we find 


# (n — r)k - it 
(27) i” hi + 0, de =1, Li = 
so that 


(28) ag ie Ee 


Consequently the conditions of the lemma are not satisfied with regard to the 
systematic sample and no general theorem exists for all populations with mono- 
tone decreasing p. The result (26) and the corollary show that for any popula- 
tion in this class which has p, = 0, u > (k — 1), the systematic sample is more 
efficient than the stratified random sample. On the other hand, (28) shows that 
in a population with the first k of the p’s equal and the rest zero, the systematic 
sample has a higher variance than a random sample. If these two results are 
collated for a population with the first 7 of the p’s equal and the rest zero, we see 
that the systematic sample with stratum size j is less accurate than the compar- 
able random sample, while the systematic sample with stratum size (j + 1) is 
more accurate than the comparable stratified random sample. Although such 
a population may not occur in practice, the result suggests that the graph of the 
variance of the mean against the size of sample is unlikely to exhibit the same 
regularity for the systematic as for the random samples. 
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10. Populations in which the correlogram is concave upwards. Further 

investigation shows that the deciding factors in determining the relative accura- 

cies of the systematic and random samples are the second differences of the p. 

rather than the first differences. The following result will be proved. 
THEOREM: For all infinite populations in which 

















Pi 2 Pi+1 = 0, 1= i, a, °°", (kn aon 1), 








and 





5; = piat pis — 2p; > 0,7 = 2,3,---, (kn — 2), 





then 
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for any size of sample. Further, of, < o%1, unless 6; = 0,7 = 2,3, +++ , (kn — 2). 
This result can be proved by expressing the linear functions of the p, in terms 
of second differences and establishing a new lemma applicable to second differ- 
ences. An alternative approach is simpler and perhaps more instructive. 
Since the p,, are monotone decreasing, o7: < o; by the results in section 8. In 
(13) above, the variance of the mean of a systematic sample for a specified finite 
population was expressed as 
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1 kn aie l . 
je 2 (4, — 2) — = (Total S.S. within systematic samples) 


l kn i 1 ; ; 
“ 7 (1; -— Z) — = (Average 8.8. within a systematic sample), 
i=1 


A corresponding equation holds for stratified random samples. For if 2;5 
X2;,°** , Xn; are the elements of any stratified random sample with mean Z,; 


(30) 2d (1:3; — 2) = 2 (x4; — Fe) + n(Ze — 2%)”. 
Now take the average over all k” samples, This gives 
kn 
(31) ia (x; — #) = (Average S.S. within samples) + nE(i4 — 2)’. 


Since the term on the extreme right is n times the variance of the stratified 
random sample, a result analogous to (29) follows at once. 
Consequently, o7, < ov: if the average sum of squares within a systematic 


sample is greater than or equal to that within a stratified random sample. Now 
by (2), with n in place of (kn), each of these averages is equal to 


(n — 1) 
2 


° 
- 


(32) E(x43 — 21;) 
where 2; , 2; are the elements in the sample from the ith and the /th strata 
respectively, the average being taken over all possible pairs of strata. 

We consider a fixed pair of strata and let 1 — 7 = u. For the systematic 
sample, corresponding elements in the ith and Ith strata are always (ku) elements 
apart. Hence, 


(33) By (ti3 — 215)" = 20°(1 — Pku)s 


For the stratified random sample, there are k’ possible pairs of elements from 
the two strata. One pair is (ku — k + 1) elements apart, two pairs are 
(ku — k + 2) elements apart, and so on, the numbers of pairs rising linearly to 
k and then decreasing linearly to one for the final pair which are (ku + k — 1) 
elements apart. This gives 


, ( 1 (k—1) oo 
(34) Ea(rs5 — 2) = 2o°<1——  e (k — || onan 


( C? §e“T-1) 


Hence, to complete the proof that oi, < o%:, it is sufficient to show that 
(k-1) 
(35) 2 A ltl emers — pew 2 0 
for u = 1, 2, --- , (n — 1), that is, for any pair of strata. This may be written 


(k—1) 


* (36) > (k — i)(pauss + pru-i — 2peu) > 0. 
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But if 5: = peut + prusi — 2pru is the second central difference it is easy to 
show that 


(i-1) 
(37) Pkuss + Pku-i — 2pku - 2» » (7 = l7| )Biutj => 0, 
ete’ 

since by hypothesis 6; > 0,7 = 2, 3,---, (kn — 2). This proves that the 
variance between the elements of the systematic sample is greater than or equal 
to that between the elements of the stratified random sample for any fixed pair 
of strata. The result’ for the overall average follows. Hence fos ea 
Further, unless 0} = 0, for all j, clearly o%, < o%:, except for samples of one. 

The essential point in the proof may be put as follows. The elements in the 
ith and /th strata are on the average (ku) elements apart for both the systematic 
and the stratified random sample. When two elements in the latter sample are 
(ku + 72) elements apart, they are less correlated than on the average, since 
Pkus+i < pPku, and thus provide more independent information. The vari- 
ance between the elements exceeds the systematic sample variance by 
20° (Pru — Pku+i). However, such cases are counterbalanced by an equal num- 
ber of cases in which the elements differ by (ku — 7) and the variance is below 
the systematic sample variance by 20°(pxu—i — piu). Because of the concavity 
of p. , the losses on the average balance or outweigh the gains. 

For the population discussed in section 9, in which p, = p, u = 1, 2,--: ,J, 
p. = 0, u > j, we have 6; < 0, 6:41 > 0, and 6, = 0 otherwise. This reversal 
of the sign of the second difference is the explanation for the anomalous behavior 
of the systematic samples with stratum sizes j and (j + 1). 

The theorem above does noi prove that the relative accuracy of the systematic 
to the stratified random sample is a monotone function of n, nor even that o%, 
decreases steadily as n increases. Actually, there are populations in the class for 
which neither result holds, as will be illustrated in the next section. 

So far as practical applications are concerned, the restriction that the p,, should 
be concave upwards may not be severe. For instance, this condition is satisfied 
when the correlogram is linear, i.e. p, = (1 — u)/l, this being one type of correlo- 
gram which Wold [6] has considered applicable to economic data. Concavity 
also holds for the function p,, = ¢ “ which Osborne [7] has suggested for forestry 
and land-use surveys and for the relation p, = tanh (uw °°) which Fisher and 
Mackenzie [8] used for expressing the correlation between the weekly rain at 
two weather stations as a function of their distance apart. In fact, if p, is 
conceived of as positive and continuous for all u, a concave upwards function 
suggests itself naturally. 


11. Linear correlograms. It may be of interest to present some results ob- 
tained when the correlogram is (i) linear, (ii) exponential, since both types have 
been suggested as possible models for populations occurring in practice. 

In the linear case, 
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(38) Pu = (L a u)/L, usb; pu = 0,u> L. 


If L > (nk — 1), the correlogram is a straight line throughout the whole range 
of the finite population. Since all second differences are zero in this case, we may 
expect o%, = 00: <o2. If L < (nk — 1), all second differences vanish except 
5; , which is positive. Hence we may expect o3, < o2: <o;. 

The results for these cases are found by elementary summations from the basic 
formulae (7), (11) and (15). Details of the summations will not be presented. 
For L > (nk — 1), we find 


2 2 6 1\ (k+ 1), o_o 1\(nk + 1) 
— Hy) aL’ ao (1— 7) Oe 
The ratio 07/02, is (nk + 1)/(k + 1), which is approximately equal to n, the size 
of sample, unless the percentage sampled is large. Thus very large gains in 
efficiency over random sampling are obtained. 
If L < (nk — 1), the formulae are less simple. Consider first k > L; that is, 
cases where the percentage sampled is less than 100/L. If N = nk, 


2_ of,  1\{3N(N — L) + (L’ - : 
ed (1 4 3N(N — 1) 


n 
(41) : “{ 1) ee eH, k>L 





k 3k(k — 1) { 


n 


(42) 2 o (1 ‘a :) jue ae I. k>L. 


k 3N(k — 1) 

It is clear on inspection that o:, < o%: ; moreover, it is easy to show that the 
efficiency of systematic relative to stratified random sampling increases steadily 
as the size of sample increases. 

When the size of sample is increased further so that k < L, formula (40) 
remains unchanged, while o%, is now given by the same formula as in (39). The 
formula for 7, is more complex. If q is the integral part of the quotient when L 
is divided by k and r is the remainder, so that L = (gk + r), the formula may be 
written 


(42’) . 
Sqk(? — 1) + 8rk(n — g)\(k — r) +r’ —- » <b 
\ 3NL(k — 1) : a 
It is noteworthy that the last two terms in the numerator inside the curly 
bracket vanish whenever L is exactly divisible by k. Further, the second term is 
of order nk = N and, when present, exerts a much greater weight than the first 
term. Thus o;, takes a sudden dip whenever L is a multiple of k. In fact, for 
L = qk, (42’) reduces to 


(43) a (1 7 ') (k + 1) 
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so that the variance goes to zero if N is sufficiently large. By comparison with 
formula (39) for o%, we see that when L = gk the relative efficiency of systematic 
to stratified random sampling is N/L, which increases beyond bound if N is 
sufficiently large. In intermediate cases, when the remainder r does not vanish, 
the leading term in the relative efficiency for N large is (k’ — 1)/3r(k — 1). 
This varies somewhat irregularly, depending on the relation between L and k, 

To illustrate, numerical values are given below when L = 10 and the finite 
population is large enough so that terms in 1/n are negligible. 

The quantities v,;, v., are the corresponding variances apart from a factor 
o/N. The stratified sample variance decreases steadily with increasing per- 
centage sampled. On the other hand the systematic sample variance goes to 
zero and the relative efficiency to infinity when k is 2,5 or 10. Moreover, in the 
intermediate cases k = 3, 4, 6, 7, 8, 9, the variance and the relative efficiency 
show no consistent relation to the percentage sampled. For samples of less than 
10 per cent, including the cases outside the limits of the table, the relative 
efficiency decreases steadily from 4 at k = 11 to 1 when kis large. 


TABLE 1 
Variances except for a factor o?/N and relative efficiency for systematic and stratified 
random samples for a linear correlogram 

k 2; 3| 4| 5| 6| 7| 8 9 | 10 | 11 | 20 


| 
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12. Exponential correlograms. For the exponential p, = e™ the results are 
much more regular. Each of the linear functions of the p’s consists of a finite 
number of terms of an expansion of the form (1 — x). If 


i —(W-1d 
ay IW = EDL eS 










° . 9 
which is the sum for o; , we find 


as) ot = (1-2) sv,» 


“(1 me :) {1 — f(k,d)} 


1 (N — 1) k(n — 1) 
“(1-71-25 E-1» + G7 ™ vay}. 
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(47) ty 
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It may be shown that the variance of the systematic sample decreases steadily 
and its efficiency relative to stratified sampling increases steadily as the sample 
becomes larger. 

In order to obtain some idea of the magnitude of the gain in efficiency, consider 
the case where k and n are large. For this case the relative efficiency, which 
actually is a function of k, n and \, turns out to depend almost entirely on the 
single quantity (kA); or, equally, on the correlation e“* between the items in 
successive strata in the systematic sample. If ¢ = (kA), we obtain o; = o’/n, 

es .2 
(48) Ost n {1 t + e a =n 9 


(49) et, = 2 {1 4%). 
n t  (e& —1)J 
The relative efficiency is given in Table 2 for a selection of values of e ‘, the 
correlation between the items in successive strata. 
The relative efficiency has a limiting value 2 when p tends to 1 and decreases 
slowly towards 1 as p falls to zero. The gains in efficiency are quite substantial 
if p exceeds 0.1. 


TABLE 2 


Relative efficiency of systematic and stratified random samples for an exponential 
correlogram 


p 9 a. st a OS 4 s 
7 


| , 44 
ost/Oy | 1.96 | 1.90 | 1.84 1.78 | 1.71 | 1.64 1.55 | 1.46 | 1.33 


It was pointed out in section 1 that no unbiased estimate of error is available 
from a single sample for either the systematic or the stratified random sample. 
This does not mean that no estimate of error can be attempted. However, any 
estimate must depend on certain assumptions about the form of the population 
which is being sampled and is likely to be vitiated insofar as these assumptions 
are false. If, for instance, the correlogram were assumed to be exponential, 
formula (47), or (49) in the particular case with n, k large, would appear to be 
the appropriate basis for the estimation of error from a single systematic sample. 
Consider the simpler case in which (49) is valid. The correlation between 
successive items in the systematic sample provides an estimate of e ‘ and hence 
of t. Also, if terms in 1/n are negligible, the mean square within the systematic 
sample is found to be an urbiased estimate of o°. By substitution in (49) a 
consistent estimate of the variance of a single systematic sample would be secured, 
provided that the exponential assumption were correct. The gains in efficiency 
over stratified and random sampling could also be estimated. 
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OPERATING CHARACTERISTICS FOR THE COMMON STATISTICAL 
TESTS OF SIGNIFICANCE 


By CHaruss D. Ferris, FRANK E. GruBBs, CHALMERS L. WEAVER 


Ballistic Research Laboratory, Aberdeen Proving Ground 


1. Summary. Methods making possible quick calculation of operating char- 
acteristics or power curves of common tests of significance involving the ,’, 
F, t, and normal distributions are presented. In addition, a comprehensive set 
of curves illustrating graphically the power of each test for the 5% significance 
level are included. We are interested in the power of: (1) the x’-test to deter- 
mine whether an unknown population standard deviation is greater or less than a 
standard value, (2) the F test to determine whether one unknown population 
standard deviation is greater than another (one-sided alternative), and (3) the 
t-test and normal test to determine whether an unknown population mean 
differs from a standard or two unknown population means differ from each other. 
Such operating characteristics have application for the quality control engineer 
and statistician in the design of sampling inspection plans using variables where 
they may be used to determine the sample size that will guarantee a specified 
consumer’s and producer’s risk. On the other hand they are of use in displaying 
the power of a test if the sample size has already been set. Finally, they area 
necessary adjunct to the proper interpretation of the common tests of significance. 


2. Introduction. In the application of the common statistical tests of sig- 
nificance there has been a great need for readily accessible information on the 
power of the test employed to distinguish between the null hypothesis and perti- 
nent alternative hypotheses for given sample size. In this connection, two im- 
portant applications arise. On one hand it becomes important for the sampler 
to know, for a given sample size and critical region, something about the power 
of the test in rejecting the stated hypothesis when some alternative hypothesis is 
true. On the other hand, if the sampler wants a given degree of assurance in 
rejecting the null hypothesis when a particular alternative is true, he would like 
to know the minimum sample size which would accomplish this when the prob- 
ability of rejecting the null hypothesis when true is given. In particular, the 
need for such information arises most frequently in setting sample sizes to dis- 
tinguish effectively, on the basis of single sample results, between (1) population 
standard deviations and (2) population means. If the sample size has already 
been set, as is the case with most specifications, quick information on whether 
or not it is large enough to keep the risk of accepting poor material down to a 
reasonable figure is highly desirable. Such probabilities will be recognized, of 
course, as the Type I and Type II errors of the Neyman-Pearson theory. Such 
risks must be given proper consideration in the interpretation of a significance, 
test or in designing the provisions of an acceptance test. 
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Needless to say, the appropriate expressions for the power functions of the 
x -test, F-test, normal-test, and t-test have been derived at one time or another 
in the literatuer. However, insofar as the practical statistician or quality con- 
trol engineer is concerned, such information has not been employed to advantage 
widely since no informative graphs or extensive tables of power functions for the 
common statistical tests of significance have been presented. Due to the prac- 
tical importance of questions of this type, the authors believe there is need for 
operating characteristics or graphical power functions of the common statistical 
tests of significance. This paper supplies such a need over a useful range of 
sample sizes and alternative hypotheses for the 5% significance level. 


3. Definitions. In the following account, we will refer to one or both of the 
normal populations, mand m.. We will let x; be a variate from 7, whose expected 
value or mean is mw; and standard deviation o;. By n; we will mean the number 
of observations drawn at random from 7 and our sample statistics will be 
defined in the usual fashion: 


ni ni 
=) n/m, 8 = > (mn — X1)*/(m — 1). 
1 1 
Similar definitions apply to the normal population 72 with the appropriate 
subscript for sample statistics and population values. In dealing with a single 
population we will drop the subscripts from the sample statistics. 
We also define 


o = a standard or arbitrary value of the standard deviation, 


a = a standard or given level, 
ni n2 
= \2 = \2 
de (a — 41)” + 2 (az 42) 
2 1 1 ° 
32. when two normal populations 


n No — 2 
1+ Me are encountered. 


Hy will be used to denote the null hypothesis and H, any one of a set of alter- 
native hypotheses. The probability of rejecting the null hypothesis Hy) when 
it is true (Type I error) will be denoted by a, and the probability of accepting the 
null hypothesis when some alternative hypothesis H, is true (Type II error) 
will be denoted by 8. 

; 2 Sa a (n — 1) ’ 

4. Power function of the x°-test. The statistic x = ange (dropping 
subscripts of sample statistics) is used to accept or reject the hypothesis that the 
standard deviation, o: , of the normal population sampled is some specified or 
given value, o. 

Our hypotheses are 


Ho: o1>-c¢o 


Ai: 0 


do, (A > 0). 
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A. To determine whether or not «; > o. We choose a significance level, a, 
_(r- 1)’ 
o ° 


2 


and compute x° If x’ > x2, where the percentage point a is 


determined by 


Serr Od” sete wath 
(1) . a | u e““du=a 
r (" —_ *) x2 
2 
we reject Hy and conclude that o > co. 
To set up the power function we note that: 


If Ho is true 
pr{( — — 


oC 





If HA; is true 
2 
pri @— Ue 5 mis, (1 — B =a, if = 1), 


However, since 


2 
Pr eS —- > xis} =1- 8 


oj 


. 2 


o 


2 3 
Vxig = x2 or A= A/ . 
Xi-8 


Therefore, for a given significance level, a (Type I error), and various Type II 
errors, 8, we can make use of the Tables of Percentage Points of the x’-distribu- 
tion [1] and compute enough of the points (A, 8) to plot the power curves de- 
picted in Fig. 1. The Type I error, a, has been set at the practical level of .05 
for Fig. 1. 
B. To detect «1 < o. We compute 
9 (n — 1)s° 
oe oe 


or 


we have the relation 





o 


and if x’ < xj_a we reject Ho , concluding that 0 < ao. 
By reasoning similar to that in A. we arrive at the relationship 


9 2 2 27 
Xi-a = NX xp OF A= fe. 
XB 


Again, by use of the Table of Percentage Points of the x’-Distribution the operat- 
ing characteristics of Fig. 2 are obtained. We have chosen the practical level of 
a = .05 for Fig. 2. 
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Example : A Rifle Association is purchasing small arms ammunition for 
match purposes. It is the desire of the rifle club that the dispersion in muzzle 
velocity of a lot of ammunition intended for match purposes be kept down to a 
practical minimum. Acceptance or rejection of an ammunition lot must, of 
course, be made on a sampling basis since the ballistic’ acceptance test is de- 
structive in nature. Moreover, for practical reasons acceptance of a given lot 
is to be on the basis of a single sample. The Association specifies that they are 
not willing to accept more than 5% of the lots whose standard deviation in 
muzzle velocity is 6 ft./sec. The ammunition manufacturer agrees that hé will 
accept these terms provided not more than 5% of the lots whose standard devia- 
tion in muzzle velocity is 4 ft./sec. will be rejected. Under these agreements, 
it is desired to know what sample size is necessary to provide the stated assur- 
ances for the Rifle Association and the ammunition manufacturer. 

In this problem, a = .05, 8 = .05, and \ = 1.5. Referring to Fig. 1, we 
find the required sample size is approximately 35. 

On the other hand, if a sample size had already been set, the appropriate 
curve in Fig. 1 could be examined to determine whether it provided sufficient 
protection against the acceptance of inferior ammunition. 






























5. Power function of the F-test. In discussing the power function of the 
F-test we will focus our attention on the problem of comparing the standard 
deviations of two normal populations. 

A. To determine whether or not the standard deviation, o; , of one normal 
population is greater than the standard deviation, o2 , of another normal popula- 
tion. We choose a significance level, a, and compute F = s;/s;. If F > Fy, 
where the percentage point F, is determined by 


r[3(m + n2 — 2)] 


— 1/-) _ 1\(me—-D 
Tia(m — DIT — Dd] (m — 1) (n2 — 1) 


(2) 


a yi 
Pa [(m1 ja l)u oe ™m— 1 }8 (142-2) 


we conclude that o; > oo. 
Our hypotheses are 





Ho: 01 = G2 
Ay: 1 = Aor, (A > 1). 


To set up the power function of the F-test we note that: 
If Ho is true 








Pr{si/s: > Fa} = a. 


1 This example is used to illustrate the use of the power of the x?-test and is not advo- 
cated as a most powerful sampling technique. (See ref. [10]). 
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If H, is true 





Pr{si/s: > Fa} = 1— 8, (1-6 =aifd =1}), 
However, since 


2,2 
pr (tle s rg} = ed 


2 / 
\S2/ 02 






Pr{si/s: > Fis} = 1 — B, 





we have the relation \’Fi_3 = F, or \ = 4/ ; 
1—3 

Therefore, for a given Type I error, a, and various Type II errors, 8, we can 
make use of the Table of Percentage Points of the F-Distribution [2] and com- 
pute sufficient points (A, 8) to plot the operating characteristics depicted in 
Figs. 3, 4, and 5. In these figures, a has been set at the practical level of .05. 

It should be emphasized that the operating characteristics presented in this 
paper are applicable only when one is interested in the one-sided alternative that 
o, > o. and not 01 < o2. Under these circumstances, the exact formation of the 
F ratio will be set beforehand and will not depend upon test results (for example, 
placing the. greatest mean square in the numerator). In those cases where one 
is interested in the two-sided alternative, a two-tail F-test such as described by 
H. Scheffé [3] should be used. It is hoped that at a later date operating char- 
acteristics of such a test calculated in a manner similar to the example in [3] 
will be presented. 

Example: It became necessary for a manufacturer to make a choice between 
a new type casting and one produced under standard design practices. One of 
the bases of comparison was dispersion in tensile strength. It was considered 
that if the standard deviation of the standard casting were larger than the new 
type, definite preference should be given to the latter. When the question ofa 
practical criterion for rejecting the standard casting was considered, it was 
decided that if its true standard deviation in tensile strength were actually 1} 
times that of the new type there should be a 90% chance of rejection. It would 
be of little practical importance to detect any ratio less than 1} in this particular 
case. It was also decided that the 5% significance level would suffice insofar 
as rejection of equal quality was concerned. A preliminary sample size of 20 
was selected, and the question arose as to how well a sample of this size gave the 
protection desired. 

The question can be answered immediately by reference to Fig. 3 (here si 
is computed from the standard casting data, of course) where it is seen that a 
sample size of 20 will fail to detect the stated difference 47% of the time. In 
order to achieve the desired protection, it is seen at once from Fig. 3 that a 
sample size of over 50 will be necessary. The exact sample size, determined 
with the aid of the formulas above, is found to be 54. 
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B. Analysis of variance. We shall consider the analysis of variance layout 
where a sample of n items is drawn from each of m normal populations with 
common variance o. It is required to decide on the basis of the sample results 
whether or not there is any variation among the true means of the m normal 
populations sampled. 

Let x;; be the jth item drawn at random from the 7th population, 


ix is 
i= - > 2; and t= oo 2. Mi, 
N jal i=1 
The F-test utilizes the comparison of the variation among the sample means 
(external variance) with that among the items within the samples (internal 
variance) in order to test the equality of population means by making use of the 
ratio 


n > (t; — zy m(n — 1) 
Pf «<a . : 
Dd (aii — &)* (m — 1) 





If F > F., where F, is defined as in 5.A., we conclude that the population 
means are not equal. 

In our approach we will assume that the m true lot means represent a sample 
from a super-population, also normal, with variance equal to @’o°. Since the 
sampling variance of the means is o /n, the total variance among the sample 
means equals 


o/n+@c =No/n, (NV =1+ 76). 
Hence, our hypotheses are 
Hy: 6 = 0 
H;:6> 0. 


Since F/)’ follows the F-distribution with m — 1 and m(n — 1) degrees of 
freedom the operating characteristic, i.e. the probability for various 6 of accept- 
ing Ho, may be obtained from the curves already graphed by setting n: = m, 
m= nm—m+i,and’ =1+ no. 

In the design of experiments when the number of populations is indefinite 
(for example, daily tests) and the total sample size mn is limited, the above 
procedure will enable one to determine what values of m and n give the most 
powerful operating characteristic for the given amount of sampling. For 
example, for mn = 24 operating characteristics for all possible pairings were 
computed and charted. They were observed to cross one another, each combi- 
nation in turn becoming most powerful for a given interval of 6. The following 
table gives the best pairings for various intervals of @: 
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6 





2 12 00-— .32 . 
3 8 22- .60 

+ 6 .60— .91 

6 4 .91-1.37 

8 3 1.37-2.50 

12 2 2.50-— 

























In contrast to the above discussion, mention should be made of P. C. Tang’s 
approach [4] to the power function of the analysis of variance. The basic differ- 
ence lies in the method of expressing the alternative hypothesis. Tang expresses 
it in terms of the variance of a finite number of population means. We express 
it in terms of normally distributed population means. We believe our approach 
has considerable practical value in control chart analyses where we are interested 
in the quality of the flow of production of a large number of lots. In addition, 
our approach obviates the difficulties imposed by the non-central x’-distribution. 


6. Power function of the normal test. 
A. The statistic u = ——___—- is used to accept or reject the hypothesis 
01 fi 
that the mean, y, of the normal population sampled, is some specified standard 
level, a, when the population standard deviation is known (for example, from 
past. data). 
Our hypotheses are 


Hoi p=a 
Hy:|p — a| = An, (A> 0). 


To test the hypothesis » = a, we choose a significance level, a, and compute w. 
If |w| > wa, where the percentage point, wu. , is determined by 


1 +tugq Ix? 
(3) a e * dx = 1- a, 
YY 2r Ua 
we reject Hy and conclude that up ¥ a. 
To set up the power function we note that: 
If Ho is true 


Pr{—te <u<+ua} =1l—a 
If H; is true 


ae < Vn(t — a) < ub = £, (1 —-B=a ifrA = 0), 


Bt 


/ 


( - /n(é _ oo 
= Pry — Wa +rAVn < Vn — 4) ¢ tte + AVn 
01 
ln — a] 
01 " 
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In the latter expression the statistic Vnlé = #) is normally distributed with 
1 


zero mean and unit variance. The required probabilities are found easily from 
tables of areas under the normal frequency curve. By computing enough 
points (A, 8) the operating characteristics depicted in Fig. 6 were constructed. 

It should be noted that the 8 corresponding to a pair of values n’ and \’ may 
be obtained from any other operating characteristic by use of the relation \ = 
\'x/n'/n. For example, if it is desired to find the Type II error for a sample 
size of n’ = 12 and \’ = 1, select any operating characteristic, say for n = 3, 
as the reference curve, compute \ = 1+/ 12/3 = 2, and find from the curve for 
n= 3that 8 = .07. In Fig. 6, however, individual operating characteristics 
are plotted for convenience and to provide a picture of the comparative effi- 
ciency of various sample sizes. 

Example: Pressure-measuring instruments are being tested against a standard 
level. It has been decided that instruments whose true mean reading is as 
much as 10 pounds per square inch away from the standard level should be 
rejected 95% of the time. On the other hand only 5% of instruments whose 
true mean reading equals that of the standard should be rejected. From past 
data, it is known that all test instruments of the type being considered have a 
stable standard deviation of 5 psi. If rejection or acceptance is to occur on the 
basis of a single sample and the normal criterion of significance, what sample 
size should be chosen to accomplish this purpose? Referring to Fig. 6 with \ = 
10/5 = 2 it is seen that a sample size of 4 provides the required assurance. 

B. In sampling two normal populations 7, and m , the statistic 
oo Z1 — Lo 

V 0?/ ny + o3/ne 
is used to accept or reject the hypothesis that u: = we. For generality it will be 
assumed that the population standard deviations o; and o2 may not be equal, 


although they are known accurately. 
Our hypotheses are 


U 


Ho: 1 = pe 
Ay: | wp — pe| = Ann. 


Significance is determined in the same manner as in 5.A., and the power 
function is set up in identical fashion. The value 8 is found to be the area 
under the standardized normal curve between the abscissas. 


Ni Ne 
+U, Xr pcanpeannaimamnions 
” + Vz Ny + Ne 


where o2 = ko,. The value of 6 may easily be read from Fig. 6 for any \’, m1, 


nm, and k by selecting the curve for a convenient sample size, n, on Fig. 6 and 
taking 


> ee 
ee —— feet é 
Vn V kn + nm 
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7. Power function of the /-test. 


/n(é — a) 
8 


A. The statistic ¢ = — is used to accept or reject the hypothesis that 





the mean, yu, of the normal population sampled, is equal to some specified level, 
a, when the population standard deviation, o; , is unknown. 
Our hypotheses are 
Ay:|u—a| =r, (A> 0). 


In order to test the hypothesis » = a we choose a significance level, a, and com- 


aA as 
pute the statistic ¢ = =~? . If|t¢| > ¢., where the percentage point, 


ta, is determined by 


T'(n/2) : tte x —n/2 2 
- fiers i (1+-*,) dx =1-—a, 


ta 








we reject Ho and conclude that np ¥ a. 
To set up the power function we note that: 
If Ho is true 


Pr{—-tea <t< +t} =l-—a. 
If H, is true, 
Pr{—ta <t < ta} = B, (l1-— 6B =aifr = 0). 


However, we have the identity 
( ae a 
Pr —ta + /n < et < +t. = + avin} = Pr{—t. <t < +h} 
1 1 1 


—a 
where ) = |/- i. 





s 
Hence, for any fixed a the above probability may be 
01 1 


denoted by say h(s/o1) or, using the notation of section 4, n(4/ x ;) and 
n— 
evaluated as the area under the standardized normal curve between the abscissas 


indicated. Then 
oe x? 2 2 
a= | H 4/ x 5) F0) ax) 


where f(x’) is the probability density function of x’ for n — 1 degrees of freedom. 
This is one method of evaluating 6 and it was used for calculating the operating 
characteristics for n < 5. 

It has been noted that such a formula had been employed by Neyman and 
Tokarska [6] in calculating Type II errors where only one tail of the ¢-curve is 
used as the region of rejection. Probabilities calculated in this manner are 
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provided by Neyman and Tokarska for degrees of freedom n = 1 to 30 and Type 
Lerrors of .01 and .05. As soon as the area in one tail of the non-central ¢-dis- 
tribution becomes negligible these curves are equivalent to the test treated 
herein with an a of .02 and .10 respectively. An idea of the critical values of X 
at which this occurs may be obtained from a table in a succeeding footnote in 
which they are quoted for a = .05. The values are surprisingly small, such that 
almost all of Neyman’s figures can be interpreted for a two-tail region of re- 
jection. , 
Using C. C. Craig’s development of the non-central ¢ [7] we obtain” 
B= Pr{ —t < Vn = w/o + Vd - +4} 
—}nd2 < Any’) t, 
=e 3 GAN | + 1/2), Hen - 1) ; — 


s/ 01 

where I(p, q;x) represents the Incomplete-Beta Function Ratio [7]. This may 
be conveniently used for those values of n where the necessary values are obtain- 
able from Tables of the Incomplete-Beta Function ratio [8] and for small values 
of \ where the above series converges rapidly. 

The method actually used for n > 4, however, made use of the tables pre- 
pared by Johnson and Welch [9]. Replacing their \ by z to avoid confusion 
with our notation, these tables give values of 7 tabulated against f, t, and € such 


that 
6 
Prit= Je > oh = « 


where z is a normally distributed variate with zero mean and unit variance, fw 
is distributed according to the x’-distribution with f degrees of freedom, and 
=t —r/1+ 2/2f. We want 
B = 1— Pr{t < —t.} — Pr{t > te}. 
For those values of \ and n for which Pr{t < —t,} is negligible’ we can, for 
any given ¢, take & = t, and f = n — 1 and read = from the tables, then deter- 




























* It should be noted that Craig’s formula as published is in error in having }(r + 1) as 
the parameter in the incomplete beta function instead of r + 4. 
* Values of \ for which Pr{t < —t.os{ = .005 are listed below. 


=n—l1 PN 
4 .34 

5 .30 

6 27 

. .25 

8 .23 

9 -216 

16 .159 

36 .103 

144 .051 


-000 
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mine 6 and finally \ from the relation \ = 6/+/n. After computing 6 = 1 — é, 
the point (A, 8) on the operating characteristic may be graphed. At the few 
places where Pr{t < —¢,.} is not negligible and £ is needed we can for a given } 
take 


lb — 6 
7 Sp ae 
V1 + #/2f 
and then by reading z for various values of ¢, f, fs make an inverse interpolation 
for e thus setting values for Pr{t > —t,.} and Pr{t > t.}. Finally 


8 = Pr{t > —t.} — Pr{t > +472}. 


It was found that for n > 10 a good approximation for computing operating 
characteristics is given by 


B = Pr{—ta t+ AVn <t < +ta + AV n} 
in which the variable ¢ is distributed as central ¢ with n — 1 degrees of freedom. 
This formula proved to be quite useful in preparation of the operating character- 
istics for the é-test. 

Fig. 7 presents operating characteristics of the é-test calculated by these 
methods. It should be noted that in using the (-test, alternative hypotheses 
are expressed as so many multiples of the unknown population standard devia- 
tion away from the level stated in the null hypothesis. In some applications 
the alternatives may be naturally so expressed. In many applications, how- 
ever, it may be desired to control the distance » — a regardless of the stand- 
ard deviation of the lot sampled. In this case, one could place confidence limits 
on the estimate of o, determine the \ value corresponding to each estimate, and 
finally obtain limits on the sample sizes or risks involved.’ 

B. For the case of two normal populations, the statistic 

ti— i. 
Sie 1/n1 + 1/m 
is used to accept or reject the hypothesis that 4: = ye when the two normal 
population standard deviations are unknown but equal to say, 1. 
Our hypotheses are 


ix 


Ao: = 
Ay:| m1 — m| = Aan. 


Significance is determined in the same manner as in par. 6.A., and, by reason- 
ing similar to that in the preceding section, it is found that 6 for a given )’ can 
be read from Fig. 7 by taking 


j= uN Ni Ne 
VnV m+ m 
4 For a test of this nature in which the power of the test depends only on the absolute 
value of the distance u — a see [10]. 
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andn =+%n2.—1. Before a statistical test of this nature is applied the data 
should be examined to verify consistency with the assumption that o1 = om, 

Example: An analysis of the difference in tensile strength between two types 
of castings is being conducted. A sample of 10 items is selected from each ty 
of casting and the t-test employed to establish superiority of one over the other, 
Experience has shown that the variability in tensile strength for one type of 
casting is comparable to that of the other type. If a@ is set equal to .05, what 
percentage of the time would our significance test fail to detect a superiority of 
one standard deviation in tensile strength? n = 10+ 10 — 1 = 19 and) = 
.513. Referring to Fig. 7 for this \ and n, it is seen that the percentage 8 is 
approximately 45. 

In this paper we have presented power curves or operating characteristics of 
the common significance tests employed but a single sample of items. The 
power of the tests obtained here does not represent the limit that can be obtained 
for the average amount of inspection performed, say, over many consecutive 
lots. Tests, sequential in character [11], have been shown to be much more 
efficient. Nevertheless, single sampling is often the only practical procedure 
available. Again, the data may be brought to the analyst as single sample 
results collected supplementary to other purposes or prescribed by a standard 
procedure. Finally, in performing a significance test, it is quite important to be 
able to give constructive advice when the data indicate practical differences 
although no statistical significance is found.’ 

Although sequential tests using variables have been devised, no investigation 
of double sampling schemes for variables similar to the Dodge-Romig [12] 
plans for attributes has, as yet, been designed with the exception of [9]. It is 
believed, however, that such plans would have considerable application for 
industry in combining efficiency with practicability. 

The graphs of the operating characteristics in this report have been made by 
calculating a sufficient number of points to draw them in by use of French curves. 
Considering this method of plotting slight error should be allowed for in reading 
probabilities of acceptance from the graphs, especially where the curves are 
steep. 
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MINIMAL VARIANCE AND ITS RELATION TO EFFICIENT 
MOMENT TESTS 


By J. R. VATNsDAL 
State College of Washington 


1. Summary. When a curve is fitted to a set of data by moments, the usual 
procedure used in testing the hypothesis that the population is of the given form 
with the parameters as computed from the moments is to compare the higher 
moments with their expected values as determined by the hypothesis. Gen- 
erally speaking, moments about the mean are computed although the reason for 
this is not clear. To shed some light on this question, the sample given in the 
introduction is fitted to two curves. Moments about various points are com- 
pared with their expected values and the discrepancy in standard units ex- 
amined. This discrepancy is found to vary widely and to have a maximum. 
The notion of equivalent moment tests is introduced, and on this basis the most 
efficient moment test is defined in such a way that of all equivalent moment 
tests, this one is most likely to reject a false hypothesis. 

For any moment it is shown that there is a point about which its variance is a 
minimum. The conditions are found which determine the position of this point 
for second and third moments. It is proved that for symmetrical populations 
the variance is minimal when the moments are computed about the mean of the 
population. If the population is an asymmetrical Pearson frequency function, 
it is proved that the point about which the third moment variance is minimal 
differs more from the mean than does the corresponding point for second mo- 
ments. The condition is pointed out for which this is true in the general case. 

The third and fourth standard semi-invariants of second moments of minimal 
variance are computed and compared to those of the second moment about the 
mean. The ratios of these are displayed for some populations to illustrate how 
this may be used to investigate when the approach to normality is more rapid 
in one case than in the other. Some examples are presented to contrast these 
and other tests. 


2. Introduction. In testing the hypothesis that a given set of observations 
is a random sample from a completely specified population (either a priori or 
specified by a consideration of the sample), generally the Chi-square test is 
applied or certain functions of the moments are compared with their expected 
values and the significance of their departure as determined by the hypothesis 
is examined. 

In the Neyman-Pearson theory it is required that the functional form be 
known. The hypothesis then is some statement concerning the parameters. 
The main principle there used is that the test used should be such that, while 
keeping the probability of rejecting the hypothesis when true at a certain sig- 
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nificance level, it will minimize the chance of accepting the hypothesis when 
some alternative is true. 

However, if the functional form is regarded as unknown, the alternative hypoth- 
eses are then usually unknown. The test then must be one that does not 
depend on alternatives. In the light of incomplete knowledge of the distribu- 
tion of sample statistics, and since moments of moments are practically the 
only ones known, we shall here use the principle of comparing observed moments 
with their expected values. It is known that the distribution of moments in 
large samples is asymptotic to the normal distribution if the appropriate mo- 
ments of the population exist [1]. Here we shall confine ourselves to such 
populations and large samples. 

To introduce the idea which underlies the theory here presented, consider a 
simple example. Suppose a sample is given and the hypothesis is of the form 
f(z, 0) with 6 = 6. Furthermore, suppose the first moment of the sample is 
equal to its expected value. If a second-moment test is used, this means that 
one computes the arithmetic mean of the squares of the deviations of the elements 
of the sample about some point, and compares this with the theoretical moment 
about the same point. Generally speaking, the point used is the mean of the 
population or the mean of the sample. However, the point may be chosen in 
any manner. For each such choice a test can be devised such that the prob- 
ability of rejecting the hypothesis when true is e. All such tests are called equiv- 
alent moment tests. Among these equivalent moment tests, one particular 
second-moment will have the minimal variance. This one is here called the 
most efficient moment test. 

This test has the property that the range of values of the second moment for 
which the hypothesis is accepted is as small as possible. Thus of all equivalent 
second-moment tests, this one is most likely to reject a false hypothesis. 

This idea may be easily extended to moments of higher order, in all of which 
the concept of minimal variance is fundamental. The point of view may be 
taken that the point about which the moments are computed should be such 
that the variance is a minimum, or what is equivalent, the variance of moments 
about the origin is minimized by choosing the origin properly. 

An example is here presented to bring this out more clearly. A sample of 
1,000 items is given and fitted by the first two moments to two different fre- 
quency functions. (The sample items are not given here; they are to be found 
in Tables for Statisticians [2]). The third and fourth moments have been 
computed and the discrepancies in standard units as determined by the 
hypotheses are exhibited in a table. 


This sample of 1,000 items considered as a sample from an infinite population 
has these moments: 


m; = 139.288 
m; = 19692.452 
m3 = 2827467.388 


m, = 412561061.04 
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By fitting the first two moments of the sample to curve A, 


n+l 
a nm ~—az 


~ Tin+i1)” “ 
we get a = 0.4781516735 and n = 65.60079029; to curve B, 


1 ea) 2/202 


a \/ 2 
we get » = 139.288 and o = 291.305056. 
The discrepancy between the observed and theoretical rth moment ital any 


point is measured by 
— ah 
M2r — Bb 
/ n 


in which m; is the rth moment of the sample of n about this point, and x.’ is 
the rth moment of the population about the same point. 

The values of | ¢ | have been computed corresponding to various points for the 
third and fourth moments. These are exhibited in four tables, given below. 

Examination of the table for the discrepancy between the observed and theo- 
retical third moments for curve B, shows that when this moment is computed 
about x = 0, the hypothesis is accepted at the 1% level; this is also true for z 
= 39.3, but for x = 139.3 the hypothesis would be rejected at that level. It 
is evident that some rule must be established to decide what point is to be used 
to make the test. 

If the curve is fitted by the first two moments the value m3; — us is the same 
for every point. This is easily demonstrated, for if m; and py; are measured 
about a point h units to the right of the origin, m3; = m3 — 3hm; + 3h’m;, — I 
and ws = ws — 3hus + 3h°u, — h®. Now, ms: = wo and mi = py. It follows 
that m3) — us. = m3 — ps. 

The maximum value of | ¢| is attained when the variance of third moments is 
a minimum. In this manner it is assured that the range of values for which 
the third moment is accepted shall be a minimum. 

If the third moments agree, or the agreement is sufficiently close such that the 
hypothesis cannot be rejected, mi) — yu; is constant or varies only slightly from 
point to point, so that minimizing the variance yields the maximum value of t. 

As is seen from the tables above, when the moments are compared at the dif- 
ferent points, the hypothesis may be accepted for one point and rejected for 
another. By the principle of using the point which yields the minimal variance, 
the hypothesis will be rejected more often than for other points. Thus, of all 
equivalent moment tests, this one is most likely to reject a false hypothesis. 

The problem of determining for various moments how the origin may be 
chosen such that the variance of the distribution of these moments shall be a 
minimum is now considered. 


. 














MINIMAL VARIANCE 201 





3. First moments. In the case of the first moment, whose expected value is 
the mean of the population, the variance is given by —(u: — ui). It is obvious 
n 
that the choice of origin does not affect the variance of the first moment, since 
5 , ae j . oe 
it is well known that uw. — yw; is invariant with respect to choice of origin. 
4, Minimal variance of second moments. The variance of second moments 


. ciate ae , 12 ‘7 , 
about an arbitrary origin is -(u,; — we). Expressed in terms of u; and central 
n 
































TABLES 
Curve A. Curve B. 

Third moments. Fourth moments. | | Third moments. | Fourth moments. | 
Point | t | Point t | Point t Point _é 
amin - -|— ~ _| ——_ - —— — la 
| oO | .0865; Oj .197 | 0  .085 0 | .02 

50 | .084 | 50 | .697| | 39.3) .19 | 39.3 | «13 
| 100 | .33 | 100 | 4.74 | | 89.3 | .69 | 99.3 | .88 
| 120 | .77 | 120 | 14.17 | 109.3 | 1.16 | 109.3 | 1.09 

130 | 1.28 | 130 | 26.76 | 119.3 | 2.39 | 119.3 | 2.00 | 

140 | 1.91 | 140 | 49.03 | 129.3 4.05 | 129.3 | 3.18 

142 | 1.95 | 145 | 45.26 139.3 | 5.57 | 133.3 | 3.83 
| 145 | 1.90 | 150 | 42.89 149.3 | 4.05 | 135.3 | 3.96 | 

150 | 1.60 | 160 | 21.31 159.3 | 2.39 | 137.3 | 3.93 | 
160 | .95 | 180 | 6.25 169.3 | 1.16 | 139.3 | 3.67 
| 170 | .57 | 200 | 2.51 179.3 | .98 | 140.3 | 3.46 
| 180 | .387 | 300 | .183 189.3 | 69 | 143.3 | 2.72 
| 200 | .18 | | 199.3 | .50 | 148.3 | 1.59 
aman 209.3 | .38 | 159.3 | .39 

239.3 | .19 | 179.3 .13 
| | | 239.3 | .07 








moments, this may be written 
, 1 . 
(1) u2(m2) = AC — ws + 4usur + 4yout’). 


Here it is evident that the variance of second moments does depend on the 
choice of origin, and is not invariant under translation. 
- os ai 2 Ma 
The minimum value of u2(m}) is given by u=- -_ and is (wu — pe — ), 
Me 1 Me 
Then we may write 


(2) y2(m2) = (us — us — #). 


Be 
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Throughout this paper m; denotes the second moment of the sample about 


— , 3 . . , ‘ oe 
an origin chosen such that nw. = — 3,” which is the value of »; which minimizes 
Me 


(1); m2 denotes the second moment about an origin chosen such that yu; = 0; 
mz denotes the second moment about the mean of the sample. It may be noted 
that in large samples the distributions of m} and ms are approximately the same. 

It is clear from (2) that if u3 = 0, or, if the population is symmetric, i.e. f(—z) 
= f(x), then po (mz) = uo(m>). However, if us + 0 then uo(ms) < po(m’). 


5. A moment inequality. Since the quantity given by (2) is essentially 
non-negative, an inequality is obtained valid for any distribution in which the 
first four moments exist, viz. 


(3) m — wb — = 20, we ~ 0 


or in standard moments 
(+) a —az;—1>0. 


This is a stronger inequality than the one given by Bertelsen [3], i.e. a; — 
as — 2 < O or the one generally known, a; > a3,[4]. This inequality, however, 
was known to K. Pearson [5, p. 432], although he derived it from a different 
point of view. 


6. Minimal variance of higher moments. The variance of the distribution 
of rth moments of random samples about an arbitrary origin always has a 
— . y = . 
minimum. The variance of m, is given by 


’ 1 , ig 
(5) uo(m,) = ~, Mar — p?). 


This expression when expanded in powers of »}; is always a polynomial of even 
degree with the coefficient of the highest power a positive number. Further- 
more, by differentiating y.(m;) with respect to u; and equating the derivative to 
zero, the value of »; which minimizes p2(m;) will be found among the solutions 
of that equation. 

For third moments of samples the variance is given by 


’ l , 2 
po(ms3) = = [us = us| 
n 


which, when expressed in terms of moments about the mean and powers of the 
mean, becomes 


1 ° i , 13 
(6) u2(m3) = 7 me — ws + 6(us — wome)mr + (15pa — Oud) ur” + 18psui+ Oyous']- 
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Differentiating with respect to u; and equating to zero, we have 
(7) 


By straightforward application of the methods of solving cubics, it is easy to 


show by means of (3) that (7) has one real root only, which moreover is $ 





Guous + Ouse? + (Sus — 3ud)ur + (us — wpe) = 0. 


Ms . 
— — according as 
Que 







as — as($a, — 303 — 1) 2 0. 
Since it can also be shown by means of (3) that the second derivative of (6) is 
positive, this root of (7) will minimize po(m3). 
These facts demonstrate: 
THEOREM I. The point about which the arithmetic mean of the cubes of the 


variates has minimal variance is to the right, at, or to the left of the corresponding 
point for the squares according as 


(8) 


By examination of (7) it is readily seen that if a; = as; or if the population is 
symmetric, the real root will be zero; so that for such a population the variance 
of third moments is a minimum when moments are taken about the mean of the 
population. If a; ¥ a; the variance of third moments will be a minimum when 
taken about some other point. 

For fourth moments of samples the variance is of the sixth degree in yu; and 
its derivative therefore of the fifth degree. There is not much to be said in a 
general way except that if a7 = asa; or if the population is symmetric, n; = 0 
will cause this derivative to vanish. 


If the distribution is a Pearson frequency function, from the recursion formula 
for the moments [6, p. 24], 


(> +4+4 2) 
SS * OS 







as — a3(Son — 203 — 1) $0. 
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2a, — 3a3 — 6 
a, + 3 


6= 


The criterion (8) can be written 







9) a (= +4+ 25 


22 5 
—s + a3 +303 — jara;. 


It will now be shown that (9) 2 0 according as a3 2 0, since (9) is a;D where 


_ 204 + 4 + 26 
1—6 


(10) D + 1+ 303 — Sa. 
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It suffices to show that D > 0 for all Pearson curves. Using the method of 
Lagrange multipliers, it is possible to show that within the permissible range of 
values of the variables involved, the g.l.b. of Dis 3,andsoD > 0. It has been 


° ° ° —@ 
proved that the variance of the squares is a minimum when p; = = o. It has 


just been shown that the sign of (9) agrees with that of as. These, together with 
Theorem I, demonstrate 

THEOREM II. For Pearson frequency functions, as ~ 0, the point about which 
the variance of cubes is a minimum deviates more from the mean than does the cor- 
responding point for the squares. 


7. Symmetric populations. For the distribution of rth moments of samples 
, ] , 79, 
(11) yo(m,) = =, Wer — pr ). 


To find the minimum of (11) expand in terms of central moments and powers 
of u;, differentiate with respect to yu; , and equate to zero. This yields: 


(2r = 2)r° Me -— + +9 + Ka" | (77) M2r—K 


For each power of ; , the coefficient is an isobaric moment function and is of 
even weight when the power of y; is odd, and of odd weight when the power of 
uw; is even. If the population is symmetric the coefficients of even powers will 
vanish as will the constant term. Then yp; will be a factor, the other factor 
being a polynomial with only even powers of u;. In this latter factor, where K 


. . ‘K— . 
is even, the coefficient of Ku;*~’ is 


(13) &) Hor—-K — aX Ol. a i) Mr—i Mr—K+i - 
Since 
r+y\_ 2 y 
Cyl 


(12) 


(13) may be written 


K 
Oplor—K + Zz bs (uer—x —~ Mr-i Mr—K+s)5 ne t, K even, 
t=0 


where a, b; are non-negative integers. 

It can be immediately established by use of an inequality due to Tchebycheff 
[7, pp. 43, 168] that usx42: > uex-u2: and therefore (13) is positive or zero. 

To sum up, if the odd moments vanish (12) will have a factor uw; and a factor 
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which is a polynomial with even powers only of u; with positive coefficients; 
therefore there is one and only one solution, un; = 0. This establishes 

THEOREM III. For a symmetrical population, the distribution of rth moments 
of samples has minimal variance when the origin is the population mean. 









8. Distribution of second moments. To study in more detail the distribu- 
tions of m> and m: the higher moments are computed and compared. Applying 
the formula for the distribution of rth moments we obtain, for m? 













ui(me) = (2 
1 ° 
u2(me) — 7, (Hs ais 2) 
(14) otal) a ag — 3a, + 2 
Vn (a4 — 1)° , 
0, 1] as — 4ag + 6a, — 3 
o)— Se naeeeneanintinion _— 
a(mz) 3 | (a — 1) 3| 
etc. 


For the distribution of m3, we get 


’ * ' M3 
wi(me) = we + 13 


Me 


2 
1 ( 2 H) 
=a" oo = 
n Me 


ag — Bay + 2+ 3a3 — Sasa; + Saar — as 
Vn (4 — a3 —1)°” 


ay(me ) —3= “(as = 4ag ~_ 6a4 -—3 a 12a5 a3 


u2(m> ) 


as(me ) 








_ 603 — 4aza3 + Gagos — 1203 + 4a3 — 4azas5 
+ aua3)(ay — a3 — 1) — 3] 
etc. 






Computing the ratios of a3’s, we have 










* 2 2 \-3/2 
1 ax3(me ) as | _ a3{3(as — a3) — a3(Sa4 — i _ _% ) 
(16) a3(me) , as — 3a, + 2 ’ a — 1 

Similarly 

as(me ) aw 3 

a4(m$) — 3 





(17) = 1 - 


a3(4a7 + Baya3 + 4otgas + 12a; — 12a;— 6asaz — a3 — oe) | 


ag — 4ag — 3a? + 12a, — 6 
2 \-2 
— a3 
¢ =) ; 
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It suffices to show that D > 0 for all Pearson curves. Using the method of 
Lagrange multipliers, it is possible to show that within the permissible range of 
values of the variables involved, the g.l.b. of Dis 3, andsoD > 0. It has been 
proved that the variance of the squares is a minimum when p; = = co. It has 
just been shown that the sign of (9) agrees with that of as. These, together with 
Theorem I, demonstrate 

THeoreM II. For Pearson frequency functions, as ~ 0, the point about which 
the variance of cubes is a minimum deviates more from the mean than does the cor- 
responding point for the squares. 


7. Symmetric populations. For the distribution of rth moments of samples 


fa, 


, 1s ‘4 
(11) uo(m,) = >, (ier —u,). 


To find the minimum of (11) expand in terms of central moments and powers 
of u;, differentiate with respect to wu; , and equate to zero. This yields: 


(Qr — 2)r° uous + ee) + Ka | (72) Mor—K 


K 
7 2 ae a : pst | totes + Qr(pora — Mehra) = 0. 


For each power of y; , the coefficient is an isobaric moment function and is of 
even weight when the power of »; is odd, and of odd weight when the power of 
uw; is even. If the population is symmetric the coefficients of even powers will 
vanish as will the constant term. Then y; will be a factor, the other factor 
being a polynomial with only even powers of »;. In this latter factor, where K 


IK-2 


is even, the coefficient of Ky; ~ is 


' 2r ~ (r r 
(13) (7) Mor—K —~— 2 (Mx _ i) Mr—i Ur—K+5 - 
Since 
z+y\_¢<¢ x y 
( n ) " a (, - »)(2) : 


K 
Que + 2, b:(yor—n — Mr—i bees); r— 1, A even, 


+=0 


(12) 


(13) may be written 


where a, b; are non-negative integers. 

It can be immediately established by use of an inequality due to Tchebycheff 
(7, pp. 43, 168] that urx42: > uwox-us, and therefore (13) is positive or zero. 

To sum up, if the odd moments vanish (12) will have a factor y; and a factor 
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. . . . . , . eo, ° ie 
which is a polynomial with even powers only of yu; with positive coefficients; 
therefore there is one and only one solution, u,; = 0. This establishes 


TuHeoreM III. For a symmetrical population, the distribution of rth moments 
of samples has minimal variance when the origin ts the population mean. 


8. Distribution of second moments. To study in more detail the distribu- 


tions of m> and m} the higher moments are computed and compared. Applying 
the formula for the distribution of rth moments we obtain, for m} 


ui(me) = pe 
0 1 2 
u2(m2) = 7, (Hs — p2) 


(14) 





0 i 304 a. 2 
a3(m2) = —=———5 
{ \" 
V/n (as— 1) 
1}| ag — 4ag + 6a, — 3 
walatie) ev Bat Seb aetinnn me — —3 
n (a, — 1)? 
etc. 


For the distribution of m3, we get 


t,o 
wi(me) = we t+ —, 
dus 
* 1 2 us 
(me) = M( “—~ #2) 
n M2 


m * as — 3a, + 2 + 303 — 3asa3 + 3a4a3 — 23 
(15) as(m_) = —— 


/n (a — of —1)" 
i 
ay(M> ) — 3 = —l(as = das + 6a —_ 3 = 12a5 a3 
— 603 — 4faza3; + Gagog — 1202 +- 4a$ — 4azas 


+ cuaz)(o4 — a3 — 1) — 3] 
ete. 


Computing the ratios of a3’s, we have 


i [sli a ote NY, dy. 
as(m2) as — 304+ 2 a — 1 








Similarly 
as(m2) — 3 
a4(m$) = 3 


(17) = E _ a3(4a7 + Bass + doen + 12a; — 12a,— 6basa3 — a= ss) 


a 
fn 8 . 
( ~~ :) 


- ag — 406 — 3a? + 12a, — 6 
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It is evident that when a; = 0, the ratio in each case is unity. These ratios 
seem too involved to make any other general statements, but for particular types 
of populations these ratios in terms of the parameters are considerably simplified. 

To illustrate this statement, consider 

—M z 
e M 
fz — ~“_ 


From the foregoing formulas we compute 


ui(mz) = M + }, ui(m2) = M 





+, 2M? o. 2M?+M 

y2(m: ) = “i. * po(m2) = —— 
(18) a3(m>) si 3. (2M + a a 
a3(m3) M 8M?+ 22M +1 


as(m2) — 3 _ (12M* + 36M + 2)(2M + 1)° 
ax(m3) — 3 M(48M* + 384M? + 112M + 1)° 


The minimum value of (18) is 0.71 for M= 1.22 and (18) is < 1 for M > 0.31. 
The minimum of (19) is 0.70 and is < 1 for M > 0.62. For the Poisson dis- 
tribution, then, not only is the variance of m3 less than that of m?, but at least 
as far as the first four moments are concerned, the distribution of m; approaches 
normality more rapidly than does m? for all values of M > 0.62. 


(19) 


When one follows the same procedure for 5) az” ¢* it is found that not only 


is the variance of m3 less than that of m3 , but as far as the first four moments 
° ° . * . ° 
are concerned, the distribution of mz approaches normality more rapidly than 
does m2, for values of p > 0.7. 
In the case of higher moments, it seems desirable to solve the necessary equa- 
tions in each particular case, since the equations are somewhat involved. 


9. Examples. A few examples are exhibited to illustrate the foregoing ideas 
and to contrast with some of the other methods. 
1. A sample of 1,000 is obtained with the following distribution 


= ¢ 1 2 3 4 
f: 625 269 91 11 4 


— iF z 

The hypothesis being tested is that the population is f, = : = ,with M = 
0.5. 
E = 0.5 and therefore the mean does not differ from its expected value. 

By using the m> test, we compute ¢ = 2.06. If m? is distributed normally, 
the hypothesis is rejected at the 5% level. By using the m: test, we find ¢ = 
1.45, and therefore by this test the hypothesis is not rejected at the 5% level. 
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Applying the x° test, we find that the hypothesis is not rejected at the 5% level. 
2. We return now to the sample mentioned in the introduction. 
Since the parameters in population A were found by fitting the first two mo- 
ments, the tests will be made on the higher moments. From the definition of 
0 we. 2 : 0 * 0 a 
mz and mz it is clear what is meant by m3, m3, m4 and m,. 
Consider the discrepancy of third moments in standard units ¢ as a function 
of h, the distance from the origin. It is easy to see that 


t= (mi — ui)/V6, 


where 
hea 12 7 v0 12 ! 
G = = [us - Ls = 6h(us — © be) + 3h’ (Sys — Sue i 2us 1) 


— 18h°(u3 — wou) + 9h*(us — mi’)). 

For the m3 test, h = 139.288. The value of h which minimizes the variance 
is a solution of 6(uz — ur )h’ — 9(us — womidh? + (Sy, — Bus’ — Quyuyh — 
(us — 342) = 0, which, for this population is h = 142.66. Using these values 
and computing, we find, for the m3 test, t = 1.90 and for the m3 test, t = 1.95. 

Using the same methods applied to fourth moment tests, we obtain for the 
m; test, h = 139.288 and ¢t = 48.7, and for the m: test, h = 143.73 and ¢t = 
52.4. 

The x* test cannot be used here since the moments alone are given; further- 
more there is some difficulty in interpreting it under these conditions. 

In this particular example, the third moment test would not reject the hypoth- 
esis at the 1% level, while the fourth moment test would reject at that level. 

3. Since population B is symmetric, it is known that the m3 and mj tests are 
identical; similarly for m{ and mf. For the mj; test, t = 5.57, which would 
reject the hypothesis at the 1% level. The fourth moment test would not be 
applied in practice. 

The writer wishes to acknowledge his indebtedness to Professor P. S. Dwyer 
for counsel and guidance. He also wishes to thank Professors H. C. Carver and 
C. C. Craig for valuable suggestions. 
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TOLERANCE LIMITS FOR A NORMAL DISTRIBUTION! 


By A. WALD AND J. WoLFow1Tz 
Columbia University and University of North Carolina 


Summary. The problem of constructing tolerance limits for a normal uni- 
verse is considered. The tolerance limits are required to be such that the prob- 
ability is equal to a preassigned value 6 that the tolerance limits include at least a 
given proportion y of the population. A good approximation to such tolerance 
limits can be obtained as follows: Let denote the sample mean and s” the sample 
estimate of the variance. Then the approximate tolerance limits are given by 


: Jn 
Pad — Ts 
Xn.8 
where n is one less than the number N of observations, x°,,3 denotes the number for 
which the probability that x° with n degrees of freedom will exceed this number is 


8, and r is the root of the equation 


1 _ =1/2 9 
TS ee” oa = F. 
V/ 29 1/4/N—r * 


The number x°,,3 can be obtained from a table of the x° distribution and r can be 
determined with the help of a table of the normal distribution. 


1. Introduction. The problem of setting tolerance limits for a distribution 
on the basis of an observed sample was discussed by 8. 8. Wilks [1], [2] and by 
one of the present authors [3], [4]. For a univariate distribution the problem may 
be formulated briefly as follows: Let x be the chance variable under considera- 
tion and let x, --- , cy be a sample of N independent observations on xz. Two 
functions, L; and L, , of the sample are to be constructed such that the probabil- 
ity that the limits L, and Lz will include at least a given proportion y of the popu- 
lation is equal to a preassigned value 8. The limits LZ; and L2 are called tolerance 
limits. 

The following two cases have been treated in the literature: (1) Nothing is 
known about the distribution of x, except perhaps that it is continuous, or that it 
admits a continuous probability density function. (2) The functional form of 
the distribution of x is known and only the values of a finite number of parameters 
involved in the distribution of x are unknown. We shall refer to (1) as the non- 


1 This paper reports work done by the authors in the Statistical Research Group, Divi- 
sion of War Research, Columbia University, under contract OEMsr-618 with the Applied 
Mathematics Panel, National Defense Research Committee. The work was first reported 
in an unpublished memorandum, ‘‘Tolerance Limits for a Normal Distribution’? (SRG 
number 392, 3 January 1945) written by the authors, of whom one was a staff member and 
the other a consultant of the Group. The problem was suggested by W. Allen Wallis on 
the grounds that the limits previously proposed (see [4], section 5) are unsatisfactory for 
most practical purposes. 
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parametric case and to (2) as the parametric case. An exact solution of the 
problem for univariate distributions in the non-parametric case has been given 
by 8. S. Wilks [1]. His results have been extended to multivariate distributions 
by one of the present authors [3]. An asymptotic solution of the problem in the 
parametric case, which may be used for large samples, was given in [4].” 

In the present paper we shall deal with the problem of setting tolerance limits 
for a normal distribution with unknown mean and variance. Approximation 
formulas are obtained which differ from the exact values by a magnitude of the 
order 1/N*. They give much closer approximations to the exact values than 
those which can be obtained by applying the general asymptotic results in [4] 
to the normal distribution. In addition, the approximation formulas in the 
present paper have the advantage of considerable simplicity and can easily be 
computed with the help of tables of the normal and x’ distributions. To estimate 
the closeness of the approximation of the formulas given in this paper, a method 
of computing upper and lower limits for the exact values has been derived. Com- 
putations show that the approximation is good even for small valuesof N. A few 
numerical examples are given in section 7. 


2. Precise formulation of the problem and notation. Let 21,---,2y be N 
independent observations from a normal population with mean 4 and variance 
o, both unknown. We shall denote by z the arithmetic mean of the observa- 
tions and by s’ the sample estimate of the population variance o’, i.e., 


w 
ie a; 





(2.1) . eo 
seals 
and 
(2.2) gs = de (x = wheren = N —-1. 


nr 


For any positive \ we shall denote by A(, s, 4), or more briefly by A, the propor- 
tion of the normal universe included between the limits  — As and # + As, ie., 


- 1 po —(1/202)(t—p)2 
(2.3) A => A (Z, 8, d) = 0/2 o e€ dt ° 
Z—ds 


A is a chance variable, since the limits of integration are chance variables. In 
this paper we shall deal with the problem of determining the value of \ so that 
the probability that A exceeds a preassigned value y is equal to a preassigned 
value 8. The desired tolerance limits will then be given by % — As and & + As, 
respectively. In practice, the values 8 and y will usually be chosen near unity, 
frequently > .95. 





2 Although the results obtained in the non-parametric case could be applied to the 
parametric case as well, it would not be satisfactory to do so, since for the parametric case 
methods having greater efficiency can be devised by taking into account the available in- 
formation regarding the functional form of the distribution. 
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It can be verified that the distribution of A does not depend on the unknown 
parameters wando. Thus we can assume without loss of generality that un = 0 
and o = 1. 

For any given positive value \ we shall denote by P(y,A) the probability that 
A>vy. Fora given value ¢ we shall denote by P(y,\ | Z) the conditional prob- 
ability that A > y under the condition that the sample mean has a given value 
&. It is clear that P(7v,\) is equal to the expected value of P(y,\ | 2), i.e., 


(2.4) P(y,A) = vN | P(y, \| Z) @ NF de. 
V 2r /-« 


3. Method of computing P(y,\ | ) for any given values y,\ and Z Since A 
= A(#,s,\) is a strictly increasing function of s, the equation in s 


(3.1) A(i,8,4) = 
has exactly one root in s. Denote this root by 
(3.2) s = r(Z,y,)). 


Thus, r(Z,7,A) is that value for which 


1 t(D) 
(3.3) wi Ce € dt = y. 
It is clear that Ar(Z,y,A) does not depend on A. We shall write 
(3.4) Ar(Z,y,A) = r(%,7). 
Obviously r(Z,y) is that value for which 
. 1 e+r(h7) 
(3.5) Van [.. e dt = y. 


For given values of % and y the value r(Z,7) can be obtained from a table of the 
normal distribution. 

Since A(Z,s,A) is a strictly increasing function of s, the inequality A(Z,s,A) > 
vy is equivalent to the inequality s > r(Z,y,A) = r(%,y)/A. Hence, since % and s 
are independently distributed, we have 


(3.6) P(y,\| @) = P(s > r(%y)/») 


where P(s > c) denotes the probability that s > c for any constant c. In gen- 

ral, for any relation R we shall denote by P(R) the probability that R holds. 
Since ns” has the x° distribution with n = N — 1 degrees of freedom, we have 

(3.7) P(s > se = P(x: > - uy 

where x*, stands for a random variable which has the x° distribution with n 


degrees of freedom. The probability on the right-hand side of (3.7) can be ob- 
tained from a table of the x’ distribution. 
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Hence, we see that the computation of P(y,\ | %) for given values y,\ and & 

can be carried out in two simple steps. First we determine the value of 7(,y) 
from a table of the normal distribution and then read the value of 


P (x > hg 


from a table of the y’ distribution. 


1 1 

4. Proof that the difference P(A | om) — P(y,d) is of the order1/N*. It 
is clear that P(y,\ | Z) is ar even function of z. Hence, in the expansion of 
P(y,\ | ) in a power serie; in Z, only even powers will occur. Terminating 
the Taylor expansion (in s ction 8 we prove its validity) at the fourth term, 
we have 


; se 
(4.1) Ply,d|#) = Plyd\0) +4 ae 4% EPA) | 
oo St aR Cham 





where 0 < &— < & 

The expected value ~ P(y,A| 4) (considering Z as a random variable) is 
equal to P(y,A). Sine + expect d value of zis 1/N and the expected value 
of 


4! OF lene 
is of the order 1/N* (this is proved ix section 9), we obtain from (4.1) 
1 oP! 1 
(4.2) Pty.) = Pi, d10) + an 52 +0( 35). 


On the other hand, substituting 1/+/N for in (4.1) we obtain 


: | _1 1aP 1 oP! 
(43) POA | Tx) > Por 10) + ON See leno + FIN? BEF lene’ 
where 0 < &’ < 1/-\/N. Hence, since the second term of the right member 
of (4.3) is of the order 1/N’, 


s 1 a P| 
(4.4) Ply lm) = P(yd|0) +5 + (a3): 





| 2N OF 
From (4.2) and (4.4) it follows that 


, 1 1 
a pon) — P(va| 2) = 0 (2). 


Thus, this difference approaches zero rapidly as N > «. 


5. Computation of the value \ for which P(x V 7m) takes a preassigned 


value 8. Denote by x‘,,3 that value for which P(x’, > x’,,3) =8. This value can 
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be obtained from a table of the x’ distribution. From (3.6) and (3.7) it follows 
that the required value \* of \ is given by the root of the equation 


(5.1) ae v) = xh. 
. 2 \VN’ , 


Thus, the desired value of X* is given by 


| 
| ie bt BAS ). 
+ Ai ait 


] 
The value r (F¥ v) is defined by (3.5) and can be obtained from a table of the 


to 


(5. 


. ‘ ‘ 3 
normal distribution. 


6. Lower and upper limits for P(y,A). As mentioned in section 2, P(y,\) is 
equal to the expected value of P(y,\ |). Thus, 
7 +00 
(6.1) P(y,d) = Ee | P(y,rd | Ze dz. 
“/ 27 /-« 
To obtain upper and lower limits for P(y,\), we shall construct upper and lower 
limits for the integral on the right-hand side of (6.1). It can easily be seen 
that P(y,\ | #) is a strictly decreasing function of 7°. Hence, to obtain lower 
and upper limits for the integral in the right member of (6.1) we can proceed 
as follows: Choose a positive constant d and a positive integer k. Denote by 
a; the probability that id < # < (¢+ 1)d, (¢ = 0,1, ---,k—1), and let ax be the 
k b 


probability that ¢ > kd. Then 25 a;P(y,) | id) is an upper bound, and 2>> ais 
i=0 t=1 


P(y,\ | id) is a lower bound of the integral in question. Thus 


k 
(6.2) P(y,d) > 2 D0 ain Ply, | id) 
i=l] 
and 
: 
(6.3) P(y,d) < 2 > 5 a; Ply, | id). 
i=0 


The two limits can be brought arbitrarily close to each other by choosing d 
sufficiently small and k sufficiently large. A method of computing P(y7,\ | Z) 
for any given value ¢ has been described in section 3 and the quantities a; can 
be obtained from a table of the normal distribution. The amount of compu- 
tational work, however, increases-rapidly with increasing k. 


3 The Statistical Research Group computed, under the supervision of Albert H. Bowker, 
a table of tolerance limit factors \ (see formula 5.2) for 8 = .75, .90, .95, .99; y = .75, .90, 
.95, .99, .999; N = 2 (1) 102 (2) 180 (5) 300 (10) 400 (25) 750 (50) 1000. Mr. Bowker also 
developed an asymptotic formula for \ (published elsewhere in this issue of the Annals) 
which, when 8 < .99, y < .999, and N > 160, agrees with (5.2) to within 1 unit in the third 
significant figure. The Applied Mathematics Panel plans to publish the table and a brief 
explanation of tolerance limits in the volume entitled Techniques of Statistical Analysis de- 
scribed in the footnote on page 217. 
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7. Approximate determination of the tolerance limits. The exact tolerance 
limits are given by  — As and & + As where ) is the root of the equation in \ 


(7.1) P(y,r) = B. 
This equation has exactly one root in \, since P(y,A) is a strictly increasing 
function of A. Denote this root by \ = XA(B6,y). Thus, the exact tolerance 


limits are given by  — X(B,y)s and + X(B,y)s. 
7 ‘ :- : 
We have seen in section 4 that P(wr | Tm) closely approximates P(y,A), the 


difference being of the order 1/N*. Thus, a close approximation to A(8,y) can 
be obtained by solving the equation in X, 


1 
(7.2) PCy, d| Ti) = 8. 


This equation has again exactly one root in X, since P(va VN is a strictly 


increasing function of A. Denote the root of equation (7.2) by \ = A*(@,7). 
Thus approximate tolerance limits are given by *—A*(@,y)s and 7+ A*(6,y)s. 


In section 5 it has been shown that 


(7.3) \*(B,y) = 3 r 
XnB8 


where n = N—1, x’,,s is that number for which the probability that x° with n 
degrees of freedom exceeds this number is 8, and ¢ is the root of the equation 


al : l 1// N+r —12/2 
(7.4) e dt=y. 
1 


/3. ~ 
vy Qr In/ N—r 








The number x°,,s can be obtained from a table of the x’ distribution and r can be 
determined from a table of the normal distribution. 

Since \*(6,y) is only an approximation to (6,7), P[y,A*(8,7)] will differ slightly 
from 8. To judge the goodness of the approximation of A*(8,y) to the exact 
value \(6,y), it is desirable to derive upper and lower limits for the difference 
Ply,\*(8,y)] — 8. Such limits can be obtained by computing upper and lower 
limits for P[y,A*(8,y)] using the method described in section 6. 

We cite here a few numerical examples to show the goodness of the approxima- 
tion. 


, . 
| | Upper limit | Lower limit of 


. | @ | @ BY) | of Ply a*(By)1 | Ply d*6,7)! 
— | —— ee Se ———<_________ 
2 | .95 95 37.674 |  .95202 — .95077 
9 | .95 99 | 4.550 | .98989 .98908 
25 95 95 | 2.631 .95161 94393 


2.972 . 99024 .98813 











214 A. WALD AND J. WOLFOWITZ 


8. Validity of the Taylor expansion of P(y,\|Z). Weshall show that P(y, |Z) 
has derivatives of all orders at every point Zz, y and X being fixed. This is 
sufficient to validate the Taylor expansion used in section 4. 

For typographical convenience write 


r(Z,y) = R. 
We have 
(8.1) s= | i dt = 
° /25 - é = 7. 
Differentiating (8.1) with respect to < we obtain 
dk —}(Z+R)2 __ ae a) —}(Z—R)2 
(8.2) ¢ + *) € = ( 2 € 
whence 
(8.3) & = tanh ZR. 
di 


Now the analytic function tanh z of the complex variable z has only purely imagi- 
nary singularities. Hence R possesses derivatives of all orders for all real values 
of 2. 

Now 


R . —1 —nzt2; (2X2 
Por| 2) =P(s>%)=1-#f t” orn" dt 
0 


where k is a constant. Hence from (8.3) 
oP 


(8.4) = 


sid po 2 2 ane 
= —kR"? 6"! tanh aR. 


The right member of (8.4) is a product of functions which are analytic in the 
entire (complex) R plane by a function which possesses derivatives of all orders 
for every real . Since R possesses a derivative (with respect to Z) for all real 
, it follows that P possesses derivatives of all orders for every real Z. 


za P| 1 
rd ae _.| ' o(4), 


Since R is a minimum at z = 0 it follows that P(y,\ | #) has a maximum there. 
Hence, from (4.1), the quantity 


#(3 a’ P| z a* P| 
i i= —~| +——| 
OZ" |z=0 4! oz |z—= 


9. Proof that 
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is never positive. Therefore 


eP| - _120°P 

ox lene - x? Of? |zn0 
" oP , = : . 
Consequently an |. 8 bounded above for | | > 6, where 6 > 0 is arbi- 
trarily small. Since P possesses everywhere derivatives of all orders, the fourth 


derivative is continuous and hence bounded above for | | < 6. From this we 
4 


; wae 
obtain that cs 8 bounded above for every real Z. 
fmt 


Since P(y,\ | Z) is always positive we have, from (4.1), that 
a’ P| 
—2 | 
OF \a04 ¥ 
For | | greater than a sufficiently large number C, the left member of the 
4 
above inequality is thus bounded below. For | <| < C we have that oa 
z=t 


is bounded below because is is continuous. Hence <3 is bounded be- 
d leet 


low for every real Z. 
4 | 


Since aH | is bounded above and below for every real %, the desired 
det 


result follows. 
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APPROXIMATE FORMULAS FOR THE PERCENTAGE POINTS 
AND NORMALIZATION OF ¢ AND x?! 


By Henry GoLpBERG? AND Harriet LEVINE 
Statistical Research Group, Columbia University 


1. Introduction. The x? Distribution and Student’s ¢-distribution are func- 
tions of a parameter n (degrees of freedom) and approach the normal distribu- 
tion as n approaches infinity. The normal distribution is a good approxima- 
tion to these distributions for large n. For small or moderate n, a better 
approximation may be obtained by using a function of ¢(or x”) which approaches 
the normal distribution more rapidly as n increases. Hotelling and Frankel 
[7] pointed out that an additional advantage of the normalization of a distribu- 
tion is that further statistical tests are possible with the normalized variate. 
Normalizing é(or x’) is equivalent to transforming it into a function which is 
normally distributed to a required degree of approximation; that is, a normally 
distributed variate of zero mean and unit variance is expressed as a function of 
(or x’) in powers of 1/n. 

The reverse problem of expressing ¢(or x’) as a function of a normally dis- 
tributed variate of zero mean and unit variance in powers of 1/n is also of prac- 
tical importance in connection with significance tests for which the significance 
levels, or percentage points, of the ¢ and x’ distributions are required. 

Cornish and Fisher [{1] (see also [2]) have given a method for the normalization 
of distributions which approach normality as the number of degrees of freedom, 
n, increases and whose cumulants are expressed in power series of 1/n, so that 
the order of magnitude of the rth cumulant is that of n™“~”. A method has 
also been given for expressing a variate with such a distribution as a function 
of a normally distributed variate of zero mean and unit variance in powers of 
1/n. 

It is the purpose of this note to apply the Cornish-Fisher method (1) to the 
derivation of asymptotic formulas for the percentage points of the ¢ and x’ dis- 
tributions and (2) to the normalization of these distributions. Tables are 
given which indicate the accuracy of these approximations and compare them 
with other approximations. Tables are also given to facilitate the calculation 
of the approximations for the percentage points of ¢ and x’. 


! This paper reports work done in the Statistical Research Group, Division of War Re- 
search, Columbia University under contract OEMsr-618 with the Applied Mathematics 
Panel, National Defense Research Committee, Office of Scientific Research and Develop- 
ment. The work was first reported in an unpublished memorandum, ‘‘Application of the 
Cornish-Fisher method to an approximation of the significance levels of £ and x2”? (SRG 
number 507, April 28, 1945). 

2 Henry Goldberg died April 19, 1945. 
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2. The Cornish-Fisher method.’ Consider the random variable y with 


probability distribution function f(y), expected value E(y), and variance o*(y). 
Let K, denote the rth cumulant of y and a, denote the rth relative cumulant of 


y; 1e., @; Let x denote a normally distributed variate with zero mean 


and unit variance. 
For every p, (0 < p < 1), let y, be defined by 


Up 
[ f(y) dy = p 


and x, by 


ry 1 wail 
[. / 2x oe 


That is, corresponding to every yp, there is an x, having the same probability 
integral (p). The Cornish-Fisher Method for expressing a normally distributed 
variate with zero mean and unit variance as a function of a standardized variate 
with the same probability integral gives 





(1) Lp ~ bo + dizp + daz, + daz, + duc’, + deze’, + «>> 
where z, is the standardized variate corresponding to yy; i.e., 

>. = Ye — EY) 

7 o(y) 


and the b; are defined in terms of the relative cumulants. 
Cornish and Fisher give also the following expansion for a standardized vari- 
ate as a function of a normally distributed variate: 


(2) Zp ~ Co + Cikp + Cory + ety + Cary + cst) + +s 


where the c,; are defined in terms of the relative cumulants. 


3. An approximation for the percentage points of Student’s ¢-distribution. 
_ o\} 
The standardized variate z = (2 ?) can be expressed as a function of the 
normal variate, x, in powers of 1/n by using the Cornish-Fisher equation (2). 
Omitting terms of degree greater than two in 1/n gives, after simplification, the 
following asymptotic expansion for ¢: 


x +a, 5x° + 16a + 3z 


(2) fae 4 wiser 
3 ' ai 4n ¥ 96n? 


’ Churchil! Eisenhart suggested the use of the Cornish-Fisher Method for obtaining per- 
centage points of the chi-square distribution not given in existing tables, a problem which 
arose in several connections, including the computation of a table of factors for tolerance 
limits for normal distributions according to two formulas devised in the Statistical Re- 
search Group, one by A. Wald and J. Wolfowitz and the other by Albert H. Bowker, both of 
which are published elsewhere in this issue of the Annals of Math. Stat, The table will be 
included in a volume by the Statistical Research Group, Techniques of Statistical Analysis, 
to be published by the McGraw-Hill Book Company in 1946; its preparation, including the 
work reported in the present paper, was directed by Albert H. Bowker; the Statistical Re- 
search Group was directed by W. Allen Wallis. 
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For simplicity, the subscript p which appears in the Cornish-Fisher equation 
(2) has been dropped. It should be understood, however, that the x and t used 
in expansion (3) have the same probability integral. It is interesting to note 
that the first two terms were derived by Peiser [4]. 


TABLE 1] 


Table of Polynomials Required for the Approximation for the Percentage Points 
of the t-distribution* 


Probability Integral | 





(p) Lp = 2 | filx) | fo(x) 

.999 3.090232 | 8.150129 19 .692529 
.9975 2.807034 6.231221 | 12.850916 
.995 2.575829 4.916548 8.834762 
.99 2.326348 3.729074 | 5.719746 
.975 1.959964 | 2.372271 2.822499 
.95 1.644854 | 1.523769 1.420203 
.90 1.281552 . 846585 .570891 
79 .674490 . 245335 .079490 


* This table can be used for determining x, f,(x) and fe(x) corresponding to 
the complements of the selected values of p by using the relations 


Li-p = —Xp 
fi(—x) = —fi(z) 
fo(—x) = —fe(zx). 


To facilitate the use of the approximation, tables of the required polynomials 
in « have been computed for selected probability integrals. The approxima- 
tion can be written 


twa 4 he) 4 MO) 4... 





n> 
where 
3 
x x 
fi (x) = ; : 
and 


5a’ + 162° + 3x 
96 ; 


Table 1 gives values of x, (or x), fi(x) and fo(x) for selected values of the prob- 
ability integral p. Table 2 gives approximate and exact percentage points of ¢ 
for selected valués of p and degrees of freedom. The exact values were taken 
from Merrington [5]. Table 2 shows the high degree of accuracy of the three 


f(x) = 





TABLE 2 
Comparative Table of Approximate and Exact Values of the Percentage Points 











on of the t-distribution 
ed ae ee ee " — ] siete 
rte Probability | Degrees of ree oe a | Exact Per- 
Integral (p) Freedom oer ieee “Me jr Point 
9975 1 2.8070 9.0383 21.8892 | 127.32 
nuts 2 5.9226 9.1354 14.089 
10 3.4302 3.5587 3.5814 
7 20 3.1186 3.1507 3.1534 
40 2.9628 2.9708 2.9712 
oa 60 2.9109 2.9145 2.9146 
120 2.8590 2.8599 2.8599 
.9950 1 2.5758 7.4924 16.3271 63.657 
| 2 5.0341 7.2428 9.9248 
| 10 3.0675 3.1558 3.1693 
20 2.8217 2.8437 2.8453 
| 40 2.6987 2.7043 2.7045 
| 60 2.6578 2.6602 2.6603 
eb 120 2.6168 2.6174 2.6174 
to 
9750 | 1 1.9600 4.3322 7.1547 | 12.706 
2 3.1461 3.8517 | 4.3027 
10 2.1972 2.2254 | 2.2281 
| 20 2.0786 2.0856 2.0860 
| 40 2.0193 2.0210 2.0211 
| 60 1.9995 2.0003 2.0003 
als | 120 | 1.9797 | 1.9799 1.9799 
1a- | 
.9500 1 1.6449 3.1686 4.5888 | 6.3138 
| 2 | | 2.4067 2.7618 | 2.9200 
10 | 1.7972 1.8114 | 1.8125 
20 | 1.7210 | 1.7246 | 1.7247 
40 1.6829 | 1.6838 | 1.6839 
60 | 1.6702 | 1.6706 | 1.6707 
120 1.6576 | 1.6577 1.6577 
7500 | 1 | 0.6745 | = .9198 9993 1.0000 
| 2 | |  .7972 .8170 | 8165 
| 10 | / 6990 .6998 6998 
b- | 20 | | 6868 | .6870 | .6870 
ft | 40 | 6806 | .6807 | 6807 
en | 60 | | 6786  ~—-.6786 | ~~. 6786 
ce | 120 | | 6765 .6765 .6766 
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term approximation for n > 10 and the superiority of this approximation over 
the two-term approximation derived by Peiser. 
4. An approximation for the percentage points of the ,° distribution. The 
' eal... il 
standardized variate z = _ can be expressed as a function of the normal 
V 2n 


variate, x, in powers of 1/n by using the Cornish-Fisher equation (2). Retain- 


TABLE 3 
Table of Polynomials Required for the Approximation for the Percentage Points 
of the x? distribution* 








1 


Potty! a | a | am | ae | ae 
999 | 4.370248 5.699690 | .619006 —1.602112 1.273498 
9975 | 3.969745 | 4.586292 .193953 |—1.113149 , _.875184 
995 | 3.642773 | 3.756598 | —.073888 | —.802518 | 622768 
99 3.289953 | 2.941263 | —.290266 | —.541971 | .411597 
975 | 2.771808 | 1.894306 | —.486382 | —.272398 | 194832 
95 | 2.326174 | 1.137029 —.554981 | —.122957 | —.077898 
.90 | 1.812388 | 428250 | —.539450 | —.017722 | 002186 
75 | 953873 | — 363376 —.346842 | —.060220 | — 030881 





* This table can be used for determining the G;(x) for values of x correspond- 
ing to the complements of the selected values of p by using the relations 
V1i-p = Tp 
G(—x) = (-—1)'G,(z), for? = 1,..., 5. 


ing terms in n°” gives, after simplification, the following asymptotic expansion 
for x’: 


; Gis(x) | Gar) | Gea 
(4) xv nt Gi(x)n? + Gr(x) + we + mane + is(z) Fine 








n n' 
where 
G(r) = V2z2 
G2(x) = (x go E 
1 3 
G3(x) = 9+/2 (x — ix) 
Gi(x) = 1 6x" 142° — 32) 
m(x) = — 405 (6xr° + 14x 2 
1 5 — 
G(x) = zonn 77a (9a" + 2562" — 4337). 


4860 1/2 
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As before, the subscript p which appears in the Cornish-Fisher equation (2) has 
been dropped. The x and x’ which are used in expansion (4) have the same 
probability integral. The first four terms were derived by Peiser [4]. 

Table 3 gives values of the G;(x) for selected values of the probability in- 
tegral p. ‘Table 4 compares various approximations with the exact percentage 


TABLE 5 
Comparative Table of Approximate and Exact Values of the Probability Integral of t 





Probability Integral of ¢ 








n=1 n=2 | n= 10 n= 20 
| | 


r 4 
Exact | “PPFOX!- | 


| Approxi- | ae | Approxi- Saas | Approxi- | i 
Exact | Exact | mate | | “mate | Exact 


mate | mate 
| 


— | 





.5311 | .5317 | .5351 | .5353 | .5388 | .5388 | .5393 | .5393 
.7734 | .7500 | .7917) .7887 | .8296 | .8296| .8354| .8354 
.0000 | .8976 | 1.0000 .9523 | .9954  .9933 | .9967 | .9965 
.0000 | .9372 | 1.0000 | .9811 | 1.0000 | .9997 | 1.0000 | 1.0000 
.0000  .9474 | 1.0000 | .9867 | 1.0000  .9999 1.0000 | 1.0000 





TABLE 6 
Comparative Table of Approximate and Exact Values of the Probability Integral of x? 





Probability Integral of x? 


n= 2 n=10 n = 20 n = 29 





| Approxi- | 


Approxi- | Exact 


A pproxi- " : | Approxi- ‘ 
a uXé | | Iux uXac 
mate Exact | mate Exact | mate Exact mate 


ae ih | 


1 | .3963 | .3935 | .0010 | .0002 | .0000 | .0000 | .0000 | .0000 
5.9646 | .9179 | .1098 | .1088 | .0004 | .0003 | .0000 | .0000 
10 | 1.0000 | .9933 | .5594 | .5595 | .0323 | .0318 | .0005 | .0004 
20 | 1.0000 | 1.0000 | .9768 | .9707 | .5420 | .5421 | .1071 | .1071 
30 | 1.0000 | 1.0000 | 1.0000 | .9991 | .9305 | .9301 | .5860 | .5860 
50 | | | -9916 | .9910 








points of x° for selected values of p and degrees of freedom. The Peiser four- 
term approximation, the Wilson-Hilferty approximation, 


2 ales -) 
b= ni 2+ 2,4/2 


and the Fisher approximation, 


x7, = rp + V2n — 1)” 
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are given for comparison. The exact values were taken from Thompson [6]. 
Table 4 shows the high degree of accuracy, and the general superiority of the 
Cornish-Fisher approximation, for n > 10. For low probabilities (.005) the 
Peiser approximation is often better than the full series; for small n, (1, 2), the 
Wilson-Hilferty approximation is often better. 


5. Normalization of ¢ and x”. The Cornish-Fisher equation (1) applied 
to the t-distribution or, alternatively, a formal’ reversion of the power series 
(3) gives the asymptotic expansion 


: P+i1, 134+8°+4+3 ] 

@ wit, = Pilg ieee oes W 
(9) . | 4n r 96n7 +t 
Expansion (5) agrees with the first three terms of an expansion derived by Ho- 
telling and Frankel [7]. 

Applying the Cornish-Fisher equation (1) to the x° distribution gives the 
expansion 


; 
D ™~ BRR80 4 ant ) ~ 0004 8469x" + 29056 
“ ~ 38880 ri 68649n + [128469x° + 29056] 
. , 2 : a —— io ‘ an 
(6) ae [53553x° + 2208" — 386] + = (34257x° + 792x! + 238,'] 
3 


- “ [25221x* + 304x°] + — x? + ah, 

6. Accuracy of the normalizations of tf and x”. The accuracy of the normaliza- 
tion (5) of t may be judged from Table 5, which compares the approximate value 
of the probability integral with the exact value. The approximate value is the 
normal probability integral corresponding to the value of x computed from (5) 
for the given values of tand n. The exact values were obtained from Student’s 
tables [8]. For fixed n, the approximation improves as ¢ decreases from mod- 
erate to small values. The approximation appears to improve as ¢ increases 
from moderate values (about 3) to large values because of the more rapid ap- 
proach to unity of the probability integral of a normal variate. ; 

The accuracy of the normalization (6) of x° may be judged from Table 6, 
which compares the approximate value of the probability integral with the exact 
value. The approximate value is the normal probability integral corresponding 
to the value of x computed from (6) for the given values of x’ and n. The exact 
values were obtained from the table of Pearson [9]. 
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THE EFFECT ON A DISTRIBUTION FUNCTION OF SMALL CHANGES 
IN THE POPULATION FUNCTION 


By Burron H. Camp 
Wesleyan University 


1. Summary. It is generally assumed in the application of distribution 
theory that, if the actual population function is not very different from the one 
used in the theory, then the true sampling distribution of a statistic will not be 
very different from the one obtained in the theory. But elsewhere in mathe- 
matics we do not assert that a conclusion will be only slightly modified by a small 
deviation in the hypothesis. This paper presents some theorems which are 
useful in determining the maximum effect on a sampling distribution of certain 
kinds of small changes in the population function. In particular, if the popula- 
tion is denoted by the function ¢(¢), if a sample of n independent measurements 
(t:, ---, tn) is taken from this population, if a statistic x = g(t, ---, tn) is 
formed from the sample, and if D(x) denotes the distribution of this statistic; 
then, when ¢ (¢) is changed by a small proportionate amount to ¢;(t), D(x) will 
be changed to D,(x), and the relation between D and D, will be subject to the 
inequality: 


| ab b 
[o — D,)dx|\ s «| D(x)dx, 


where 
e = (14+ 6)” —1, and ldi/o — 1! <6. 


2. It is generally assumed in the application of distribution theory that, if 
the actual population function is not very different from the one used in the 
theory, then the true sampling distribution of a statistic will be not very different 
from the one obtained in the theory. For example, we commonly apply to 
practical problems the distribution theory that has been obtained on the hy- 
pothesis that the population is normally distributed even though we know that 
our actual populations are only approximately normal in form, and we commonly 
assume that our results are approximately correct. But elsewhere in mathe- 
matics we do not assert that a conclusion will be only slightly modified if we only 
slightly modify the hypothesis. An example of our unwillingness to do this 
in other branches of mathematics is illustrated in the following example. 

Example 1. Let y = ¢(t) have the derivative y’ = ¢’(t). Let $(¢) be re- 
placed by ¢i(t), where ¢: — ¢ = s(t)(t), and | s(t) | < ¢, € being small. We 
have thus chosen to make (¢; — @) small relative to ¢ rather than small abso- 
lutely so that this example may be useful in another connection. The derivative 
of ¢; may of course differ very greatly from ¢’(t), as for example in some of the 
approximations made by a few terms of a Fourier series; and it would be a major 
error to assume that the two derivatives are approximately equal. How can we 
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be sure that, in the process of finding a distribution function, we are not making 
an error of the same’ sort? 

The following theorems partly answer this question. The theorems will first 
be stated and proved in great generality. Then we shall return to the functions 
in Example 1 as a special case. We shall be concerned with a sample consisting 
of a single observation of n measurements (4; , --- , t2) drawn from the multi- 
variate universe ¥(t,, --- , t,), or, more briefly, with the vector 7’ as a sample 
from the n-way universe ¥(7). Throughout this paper y and y shall be func- 
tions which are non-negative and whose integrals over the entire spaces of their 
definition are unity. Let the statistics (1, --- , 2»), or more briefly the vector 
X, be constructed from 7 thus: 


(1) 


lf now p represents any measurable point set in X space and ii dX is used for 
(dz, --- dx,) and dT for (dt, --- dt,), a fundamental theorem [1] of distribution 
theory asserts that, if gis the point set in JT’ space for which X is in p, then the 
distribution D(X) is determined by the equation, 





= gl(T), --- , tm = gnl(T). 


(2) [ D(X) dX = [ ¥(T)dT, if these integrals exist. 
">? q 


THEOREM 1. Using the foregoing notation, let ¥(T) be replaced by y(T) and 
let Wi(T) — WT) = W(T)S(T), where | S| < €, and as a consequence let D(X) be 
replaced by Di(X); then 


(3) [ Dax - / D(X)dX << / D(X)dX <«. 
Pp Pp i Pp 


To prove these inequalities we merely need to notice that the point set g 
depends on the g’s but not on the universe, and that therefore we may use the 
same p and q as in (2) in the following equation which determines D, : 


(4) [ D,(X)dX = [ Wa(T)aT. 


Subtracting (2) from (4) we obtain 


(5) [ Drax _ | pax ‘ [ pax ” [w- yaT 
yp I i Pp ‘ q 


= fvusar <e [yar =« | Dax <«, 
1 %q q | Pp 


1The general question being raised here has been approached heretofore from differ - 
ent points of view. In particular, other exact population functions besides the normal 
have been studied, and in some cases the distribution theory has not been greatly dis- 
turbed as a result. Also, the effects of slight changes in the parameters of a population 
function have been studied. 
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since y is never negative, and the integral of D is never greater than unity. It 
should be noticed that the final inequality of (5) is independent of the g’s, al- 
though this is not true of the preceding inequalities, which do depend on the 
g’s because they involve p and q. 

Corotuary. In particular’ let ~ = $(t) --+ O(tn), where $(t) defines a one- 
way universe function, and ti, --- , t, are independent samples from it. Let x = 
glt:, +--+ ,t,). Then, tf o(t) is replaced by ¢1(t), and if d:. — ¢ = s(t)o(t), and if 
| s(t) | < 6, and if D(x) is the distribution of x before the replacement, and D,(x) 
is the corresponding distribution after the replacement, 

k ! ' b 


; se 
| | (Di — D)dx | < « Ddz |< .«, 


where 
e= (1+ 56)" —land-«x« <a<b< «x. 


This corollary follows from the theorem because of the universe, 


W(t as. tn) = (ti) vo d(tn), 


and 


Wilti, «++, tn) = O(h) --- o(t,)[1 + s(4)] --- [1 + s(@)], 


so that, in the notation of the theorem, 
Wi(T) = KT) + V(T)S(T), 
where 
S(T) = [s() +--+ + s(tr)] + [s(t)s() + --- + s(trar)s(tn)] 
+ +++ + [s(4) --> s(¢,)]. 


Hence 
ls | et | sale ieee I te nnd = (1+5)"—l=e 
tr ey 2!(n — 2)! 
The interval (a, b) now replaces the point set p of the theorem. 

This theorem and its corollary are powerful in that they may be applied to all 
statistics, but they are weak because of the restrictions on S(T) and s(f). It is 
to be noted also that the corollary is ineffective when n is large, a difficulty which 
seems to the author to be implicit in the sampling process. The restrictions on 
s(t) make it impracticable to apply the corollary to the following example since, 
as will be observed, if | t| > c,¢: — @¢ = —¢, and so then | s| = 1; and when 
6= le= 2" — 1. 

Example 2. Let ¢(é) = (27) 1"e"” in (— «&, «), and let ¢:(é) = A(2x)*” 
e *? in (—c, c) and let ¢(t) = 0 if |¢| > c, where c is not infinite and A is so 
chosen that the integral of ¢; over (— ~, ~) is unity. 

This type of example is important because, in the attempt to apply the theory 
of normal distributions to practical matters, the first discrepancy that appears 


2Qne could as well use ¢'(t,) --- ¢™(t,), but we choose the simpler case on account of 
its importance. 
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is that in the theory the given distribution is infinite in extent while in practice 
it is finite. The following theorem generalizes the preceding one so as to permit 
it to apply to this example. 

THEOREM 2. Let all of T-space be divisible into two parts, Qo and Q, , estidiutns 
the following conditions. In Qo let y(T) — ¥(T) = S(T)W(T), and let | S(T) | < 
e. In Q, let W(T) = 0, and let 


[ wnan) < « 









[ Dax - [ vax| < «fdax+aceta. 
Pp p p 


It is not required that Qo or Q: be the totality of points for which its attendant 
conditions are true. 


Proor. As before, if the integrals exist, 


[ pax = [var, and [ vax = | yar. 
p q Pp q 


Hence 


[ Dax = [ pax im fw ~ yaT = fw ~ dT + [ w ~ y)aT, 


where qo is that part of g which is in Q , and q: is that part of g which is in Q,. 


(6) [pax - [oax <| fw —war +) fn - war). 





| | | 
| (vi — y)dT | = / SydT | < / ydT 
| “go | “do qo 






“xs / ‘aT = / DdX, because y > 0. 
qg p 


| | | | | 
(8) | [ (i — WaT | = | [ yaT | < |] ydT | <a, 
“a1 a1 1 “Q) | 


because yi = 0 in qi. The inequalities (7) and (8), when substituted in (6), 
prove the theorem. 
Coro.Luary. In particular, let y, and x be defined as in the corollary to Theorem 
: and let p(t) be so defined that, if |t| < c, di(t) — o(t) = s(t)o(t), where as before 
| s(t) | < 6, ande = (1+ 6)" — l;and,if|t| >, letg(t) = 0. Also let 


[, ai - o(t,)dT < € where Q: is the set where | t;| > c for at least one value 
f 1. Then 








| [ Ducoaz s [ Deas |< [ Deax Mec vem. 


provided these integrals exist. 
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Proor. This corollary is implied in the theorem if we let ¥(7) = 
o(t1) --- b(t.) and Wi(T) = gilt) --- di(t,), and then let Qo be the point set in 
T-spaee where | t; | < c for all values of 7, and Q, be the point set where | ¢; | > ¢ 
for at least one value of 7. As in the corollary to Theorem 1, p becomes the 
interval (a, b). 

Example 3. Let ¢ and ¢; be as in Example 2, and choose c = 3. Then A = 
1/0.9973 = 1.0027, and 


/ b(t) «+ -b(t,)dT = 1 — (.9973)". 
Q1 


This quantity may be taken as «,. Also 


| @: — ¢)/@| = | A — 1| = 0.0027. 


| 


This quantity may be taken as6. Then e = (1.0027)" — 1. Hence 


b b b 
| Di(x)dx — | D(x) | dx < « | Dix)dx + e. 


If n is not: large, an approximate value for both e and ¢, is 0.003n. This quantity 
is not particularly small unless n is small, but it could not be expected to be 
very small since the corollary pertains to all statistics of the form xr = 
mh, **-, &). 

Example 4. In one of the author’s earlier papers [2] he found the distribution 
of the geometric mean, «x = (4, --- ¢,)''", of m observations chosen from the 
universe described by the so-called curve of equal facility, whose equation is 


] 


_ (1/2?) (log ¢/ G)2 
tev 2x 


eae 


The author stated that there was about as good justification for assuming that 
the distribution of statures was given by that universe as for assuming that it 
was normal. After one more theorem we shall now be able to state that, if one 
wishes to cling to the assumption that the distribution of statures is normal, then 
the distribution of the geometric mean is close to the distribution found in that 
earlier paper. We do need another theorem for this because we should be deal- 
ing with two distributions, ¢;(¢) and ¢(¢), which do not obey the requirements of 
the corollary of Theorem 1, because they approach zero at different rates as ¢ 
becomes infinite, and do not obey the requirements of the corollary of Theorem 
2 because neither vanishes throughout the infinite intervals for which |¢) > c. 
But the following theorem and corollary will take care of this and of similar 
cases. It will be observed that Theorem 3 includes Theorem 2 as a special case. 

THEOREM 3. Using the foregoing notation, let all of T-space be divisible into 
two parts Qo and Q, satisfying the following conditions. In Qy) let W(T) — YT) = 
S(T)W(T), and let; S(T) | < «. Let T = Qo + Q: and 


[ wcnar + | WT)AT < &. 
Q Qi 


' 
L 
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Then 
[ piwax - [ pewax Sait [ p(xyax “de iw. 
Pp Pp a 


Proor. As before, 
| Dax és [ pax - [w _yT = [@ — y)dT + [w ~ y)aT 
Pp Dp qd qo q1 


DidX — | DdX | < | (yi —yp)dT + (Wy —p)dT| = 1+ 11. 
[ dx [ [ w ye fw y 


Z 


I<e] Dax <e. 
p 


il < | wat + | var<| wat + [ VdT <a. 
a Qa. Qi Q1 


These inequalities together prove the theorem. 


Coro.uary. I[n particular, let y, do, , and x be as in the corollary of Theorem 2, 
except that now, instead of requiring ¢:(t) to vanish when | t| > c we shall let Q, 
and «, be so chosen that ‘ 


/ gilli) e** dilt,dT + / o(t,) -*+ o(t,)dT < &. 
Qt 


QL 


Then 
b t b 
| Di(a)dx — | D(x)dx  < a Dix)dzr +e Seta. 


As before stated, the inequalities of this paper apply to all statistics for which 
the integrals involved exist. It seems probable that closer inequalities could be 
devised by placing appropriate restrictions on the g functions which define 
these statistics. 
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AN EXPERIMENTAL DESIGN FOR SLOPE-RATIO ASSAYS 
By C. I. Buiss 


Connecticut Agricultural Experiment Station and Yale University 


1. Summary. When the response to a drug is a linear function of arithmetic 
dosage units, the relative potency of two preparations can be computed as a 
slope-ratio assay. Their dosage-response curves are computed by solving three 
simultaneous equations to obtain the common intercept a’, the slope of the stand- 
ard, b; , and the slope of the unknown, b.. The method is applicable to certain 
microbiological assays for the vitamins. Usually several unknowns are assayed 
at one time with a single standard. Their calculation is simplified when such 
assays meet the following requirements: (1) restriction of treatments to the zone 
within which the response is related linearly to the dose, (2) equal spacing of 
doses on an arithmetic scale beginning with the negative control, (3) an equal 
number (k) of doses of standard and of each unknown and (4) r replicates for 
each dose of unknown, h’ replicates for the negative control and h replicates for 
each dose of the standard. 


2. Method of Analysis. The design and analysis of assays for measuring drug 
potency has been developed largely about the linear relation between response 
and the logarithm of the dose of many drugs. An alternative procedure is 
available when some measure of the response is related linearly to arithmetic 
dosage units. Recently Finney [5] has applied the technique to microbiological 
assays of the vitamins. The relationship is also suitable for experiments with 
toxic agents on micro-organisms, where the length of exposure to treatment is 
the dose. Since potency is measured from the ratio of the slope of the dosage- 
response curve for an unknown to that for the standard preparation, Wood [6] 
has termed the method a “‘slope-ratio assay.” 

The validity of quantitative biological assays depends upon a qualitative 
similarity between the standard and the active agent of the unknown. When 
the response is related linearly to the log-dose, this is determined by testing the 
parallelism of the lines fitted separately to the results for the standard and to 
those for the unknown preparation. If the departure from parallelism is within 
the sampling error, the combined slope is determined from the data on both 
preparations and used in computing potency and its error. The analogous test 
in slope-ratio assays is the convergence of the lines relating response to arith- 
metic dose at zero content of drug, using drug as a generic term which includes 
vitamins, poisons and physical agents. When the curves for the standard and 
the unknown are computed separately, their zero intercept should agree within 
the experimental error. In assays meeting this requirement, the curves are 
computed so that they are forced to intersect at zero dose. The curves 


yi =a’ + dit 
232 


' _—T— lh! _ —_ 


VF = UW 


ear 


SLOPE-RATIO ASSAYS 233 


and 
y= a’ + bere 


are fitted by solving three simultaneous equations to obtain the three statistics, 
a’, b, and by which are the best estimates of their respective parameters. Finney 
[5] has illustrated the technique with data from the microbiological assay of 
nicotinic acid and given a suitable test for convergence as well as the error of the 
estimated potency. 

The calculation described by Finney is flexible but not adapted for routine 
use. With certain restrictions in design, the calculation can be reduced to a 
practicable form for the assay of (m — 1) unknowns against a standard prepara- 
tion. These restrictions are as follows: 

1. Doses both of standard and of unknowns must fall within the range for 
which some function of the response is related linearly to an arithmetic scale of 
dosage units with convergence at zero dose. 

2. Within this range the doses (x) of standard and of all the unknowns must 
be spaced similarly and preferably equally on an arithmetic scale, beginning 
with the negative control (x = 0). 

3. The doses of each unknown must match those of the standard in respect 
to both number (k) and their expected potencies, so far as the latter can be 
judged in advance. Within an assay group there may be h’ replicates of the 
negative control, 4 replicates of each dose of the standard and r replicates of each 
dose of each unknown. 

4. Some element of randomization must be introduced within an assay group 
in respect to the preparation of the tubes, their handling and the reading of the 
results. Replicates of any given dose or of the negative control must not be 
prepared together. 


3. Computational Procedure. The simplified calculation of potency and its 
error depends upon substituting the assumed for the actual doses. When 
spaced equally on an arithmetic scale, they may be coded by using the numbers 
1, 2, 3,---k, k being equal throughout the assay. The sums of the coded 
doses, S; , and of their squares, S: , are then the same for each preparation and 
may be entered in the equations for computing the inverse matrix, of which the 
first three are 


+=0 +=1 71=2 
Noo: ASiei + rS1C2i t+ = i, 0, 0, am 
(1) ASieo + AS2c1: = QO, 1, 0, 
rSiCoi + 7S82C2i = Q, 0, l,-:: 


where the total number of observations is N = h’ + kh + kr(m — 1). Multi- 
plying the last two rows by —S,/S2 and adding the products, we have 


2 2 
-@-Hom<t <2 - 


Se Se} Se . Se . 





n- 
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Shea hei des if) 


. sane ek ees PO OR fot arr.’ = to 
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where the subscript : refers to the standard and the assay includes » to » unknown 
preparations. Substituting 

D = NS2 — hSj — rim — 1)Sj, 


this leads to the following reciprocal coefficients: 


Cm = S2/D 

Ci = Co = — Si/D, gm 1], 2, oc m. 

(ny = 1/hSe + Si/DS2 

Ci; = 1/rS. + Si/DS2, 1 = 2,3, +--+ m, and 

¢;; = Si/DS2 for 7,j = 1, 2, +++ m, where z ¥ j. 


The reciprocal coefficients are computed from the sums of the doses and their 
squares, which are the same for all preparations. The doses are multiplied by 
the responses observed at each dosage level to obtain T; = S(xy;) for any given 
preparation. For the standard there will be h responses at each dose and for 
each unknown r responses. Let T = S(T;) be the sum of these products over all 
m preparations. The total response for all N observations S(y), including the 
negative control, the standard, and all the unknowns, is designated as T 


ve 


Using normal regression theory, the common intercept is computed as 
a = cooly + cal. 
Substituting the above reciprocal coeficients, 
(2) a’ = (ST, — SiT)/D. 
The slope of the standard is computed with the reciprocal coefficients as 
bb = caTy + enTi + iT — cuiTi . 


We may take advantage of the identities 








S, S; 
Cy = —-— Co and oS — — Oe 
So 2 
to obtain 
S 
1 
by = (en — 1;)T1 — —a’ 
S2 
reducing to 
si T: a’S; 
(3) Dy => — — — 
hSs So 


Similarly the slope of each unknown is equal to 


b; = cD + culi + cxTi + cil — e¢3{T: + Ti} 
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where 7,7 = 2,3, +++ mandj #7. Since a; — ci; = 0, this may be reduced to 


TV a’S; e 
4) = —- > t= 2,3,°+* m. 
(4) ) rSe S ’ 2%» 
The computation is further simplified if the k doses of all preparations are 
spaced not only similarly on an arithmetic scale but also at equal intervals. 
In this case 


Si = k(k + 1)/2 and Se = k(k + 1)(2k + 1)/6. 
Substituting in equations (2), (3) and (4), the common intercept, the slope of 
the standard and that of each unknown may be computed as 
(5) ° Py = 2(2k + 1)T, has 6T 

N(k — 1) + 3h’'(k + 1) 
3 f 2M ' 
(6) 7 ae a. 
6 1 Ok + i UKE +1) a 
2 jf 7, ' 
(7) & = ——— { —__—_———_. — , 
i 3k +1 \rkk + 1) a 
In computing the slope for each unknown in an assay the only variable is 7; . 
The intercepts and the slope can be checked by substitution in the equation 


(8) 2Na’ + hk(k + 1)bi + rk(k + 1)(be + +++ + Om) = 2T,. 


In terms of coded doses, the potency of an unknown (7) relative to that of the 
standard (,) is computed as 


» 


r_ db 
(9) ee: 


Each J’ is converted to original units by multiplying it by the ratio of the dosage 
intervals, J,/I, , the potency being 


bul. 
10 = : 
(10) J iT, 


The variance measuring the distribution of the observations about the m 
lines may be determined as 





2 S(y*) —aTy — Ti — --- — balm 
(11) Oe Fe Rccenclgnnnanate ; 
N—-m-—-1 


The variation about the individual lines is assumed not to vary from one prepa- 
ration to another. This is more likely to be true when the assumed potencies 
differ but little from those computed from the assay, so that J’ differs relatively 
little from unity. 

The confidence limits for potency as estimated from the ratio of the slopes 
may be computed from Fieller’s basic formula [4!. For confidence limits, X, , 
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at an appropriate level of significance, such as P = 0.05, ¢ is read from the Stu- 


dent-distribution for N — m — 1 degrees of freedom and entered with s° from 
equation (11) in the equation 
(12) Xi(bi — ens?) — 2X (bibs — cass’?) + (03 — cus?) < 0, 


where 7 indicates one of the 2 to m unknown preparations. When solved for 0, 
the limits may be written 


(13) ~ _ bibs — cuss?t? 
Xt hi b; —- cust 
+ st V (en — c1:)b? + (cis — C1i)b? + as(bi — 03)? — (encu — Ch) st 
b? — C1,8°t? 
— 9 F h a 
where ¢y — ¢1,5 = 1/hS2, cu — 145 = 1/rS2.and eye, — ar = ( aa )S1 +2 





rhDS2 
In all critical cases, the exact limits should be computed. 

In most slope-ratio assays the individual slopes differ very significantly from 
zero. Under these circumstances the approximate limits may be computed 
with reasonable accuracy from the variance of the estimated potency by the 
familiar formula for the variance of a ratio [1]. 


22 / ) 
me bis J(u , Cex 2¢1; | 


Vd’) = 


oF le” 8B bybsf 
(14) — 
s — ; _. 
= bi i (cu — c1s)b5 + (Ces — Cs )d1 + s(n — b;)*}. 
1 
The discrepancies between the approximate and the exact limits are evident 
from a comparison of equations (13) and (14). When the doses are spaced at 
equal arithmetic intervals, equation (14) can be reduced to the more convenient 
form 





. r 6° fhtr? | 311 — J’y 
(15) Si" = ror 4 1) \rA(F a1) NiEa BPD d 
b2(2k + 1) \rhk(k +1) NG 1) + 3h'(k + 1) 

A major limitation to slope-ratio assays is the frequent curvature in the rela- 
tion between response and arithmetic dosage units. For this reason it is advis- 
able to use routinely four or more doses of each preparation. Occasionally an 
assay in which there is curvature at the highest dosage level may be salvaged by 
computing the potencies from the data of the smaller doses. The agreement of a 
given assay with the postulate upon which it is based may be tested objectively 
by an analysis of variance, segregating the sums of squares (a) for the agreement 
of the negative control with the intercept, (b) for the agreement of the individual 
curves at the intercept, (c) for agreement of the observations with straight lines 
fitted individually and (d) for the variation among the h replicates of the stand- 
ard, the h’ replicates of the negative control and the r replicates of the unknowns. 
The calculation of such an analysis is greatly facilitated by the recommended 


/ 


\w — 
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design. Since it follows the usual pattern, it will not be described here. The 
procedure has been tested with the data from an experiment on the depth dose 
of x-rays [2] and has been applied to microbiological assays [3] in papers where 
the reader will find the technique exemplified. 
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NOTES 


This section is devoted to brief research and expository articles, notes on 
methodology and other short items. 
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COMPUTATION OF FACTORS FOR TOLERANCE LIMITS ON A NOR- 
MAL DISTRIBUTION WHEN THE SAMPLE IS LARGE’ 


By ALBERT H. BOWKER 


Columbia University 


In their paper [1], Wald and Wolfowitz discuss the problem of finding tolerance 
limits of the form ~ + As for a normal distribution. They propose the following 
large sample formula for \ which appears to be satisfactory for all practical 
purposes for N 2 2! 


(13 r a (Fa ) 
— —— © 7? 
3 \VN’" 


where N is the number of observations (n = N — ‘1), y is the tolerance coeffi- 
cient, 8 is the confidence coefficient, r is defined by 


ere 


— | ER = ¥ 
V \/2n heii” 


and xg has the property that P(x’ > xs) = 6 for n degrees of freedom. To compute 
, tables [2] or known approximations [3] for x3 are customarily used, but the 
computation of r, even for large N, is tedious, involving an iterative procedure. 
The purpose of this note is to obtain an expansion of r in terms of 1/+/N and to 
combine this expansion with a known one for x3 to obtain an asymptotic formula 
for X. 

To derive a large sample formula for 7, consider the function 


a aa 
| peel re ewne 
(2 Say \/on a i y 


1 , ¥ 
x and r are replaced by rand y. It is desired to express 








y as @ power series in x. Let yo be defined by f(0,yo) = 0. Since f(x,y) is a con- 


1 This paper reports work done in the Statistical Research Group, Division of War Re- 
search, Columbia University, under Contract OEMsr-618 with the Applied Mathematics 
Panel, National Defense Research Committee, Office of Scientific Research and Develop- 
ment. The work was first reported in an unpublished memorandum, ‘‘Computation of 
Factors for Tolerance Limits when the Sample is Large’? (SRG No. 559, September 24, 
1945). <A brief account of the application of tolerance limits, including tables, will be 
published in Techniques of Statisiical Analysis described in the footnote on page 217. 
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TABLE 1 
Comparative Values of Exact and Approximate 
(al 6 t—“‘<“‘iL”*”*”:CtC!!!C*dSOSt~<CsSCS 
\ ¥ { . | | | : | | ‘ 
| Dif- Dif- Dif- 
7 A | A - | A . 
m3 Exact | ee | - Exact | ag | Ser~ | Exact ae | Se. 
ae ee . Sea [seen 
75 | «7 1.25480! 1.25147) .00333 | 1.21808] 1.21698} .00110 | 1.20161 “ 20108! .00053 
| .95 | 2.13774) 2.13226; .00548 | 2.07533) 2.07349) .00184 | 2.04728 2.04639) .00089 
| .999| 3 58821| 3.57979) .00842 | 5.48401) 3.48112} .00289 | 3.43704! 3. 43563 | .00141 
| | | | | 
Oe ee ee ee ee ee ee ee i—— 
.95 | .75 | 1.39621) 1.38467; .01154 | 1 31050) Bs 30670 .00380 | 1.27204| 1.27022} .00182 
| .95 | 2.37866) 2.35921, .01945 | 2.23279| 2 -22635| .00644 | 2.16728) 2.16420) .00308 
| .999| 3.99259 3.96080 -03179 | 3.74835) 3. 73776) 01059 | 3.63850) 3.63341| .00509 
.99 | 75 | 1.51184 1400 .02283 1.38251] 1.37811] .00740 | 1.32566) 1.32215) .00351 
| .95 | 2.57565 2.53698) .03867 | 2.35546) 2. eames .01256 | 2.25865) 2.25268] .00597 
| 999 4.32325| 4.25926, .06399 | 3.95429) 3. 93343) .02086 | 3.79189! 3.78196) .00993 
Comparative Values of Exact and Approximate \—Continued 
\ Nn | 500 | 800 | 1000 
\ a a a. | 
| | Dif | | Dif- | | Dif- 
A x- | A | A x= 
i \ | Exact | . =. | Exact | meen pid Exact ae ~ 
75 to | i, 1739 1.17724; .00009 | i. 17126! L. ren 00004 | 1.16891) 1.16888) .00003 
.95 | 2.00593) 2. aie .00015 | 1.99559, 1.99552) .00007 | 1.99158} 1.99153} .00005 
999) 3.3676 9 3.36744 | -00025 | 3.35034) 3.35022! .00012 | 3.34361) 3.34352) .00009 








| | 
.95 | .75 | om 1.21470) .00031 | 
.95 | 2.07013} 2.06960) .00053 
| .990| 3.47547 3.47459) .00088 

| } 


.99 | .75 | 1.24268 
| 95 | 2.11727 


| .999) 3.55462) 








20062! 1.20047) .00015 | 1.19502) 1.19491| .00011 
.04562| 2.04536 .00026 | 2.03608! 2.03589| .00019 
43433) 3.43390, .00043 | 3.41831) 3.41800) .00081 


oS 











Noe 


.24208, .00060 | 1.22198 1.22169) .00029 | 1.21395| 1.21374] .00021 
.11626) .00101 2.08201; 2.08152) .00049 | 2.06832) 2.06797) .00035 
.55292| .00170 | 3.49543) 3.49460) .00083 | 3.47244) 3.4 47186 .00058 


wowo=— 


. aaa On . Of | 
tinuous function of x and y, and dee ~ 0, the function y(x) defined 
v0 

of 

ad se s dy _ ax 

implicitly by (2) is continuous. Since ee af = tanh xy, the higher deriva- 

fe 

ay 


tives of y(x) exist and are continuous and y(x) permits of a finite Taylor’s ex- 
pansion. The coefficients of odd powers of x drop out and we obtain 


2, 3y0 — yi ' 


= Uo _ 5 = — q P + O(2°), 


Taray a Yruare 


Tvl 


a] 
» 
» 
J 
ia 
‘ 
» 
. 
Ld 
r 
C 
a 


wT «vet & « 











240 DAVID F. VOTAW, JR. 


or returning to the original notation and retaining terms in 1/N, 


(3) r~re(1 + 5h) . 


1 *p " 
If x, is defined by val e? dt = p we know from [3] that 


2 i 2 

x8 V 2 1-8 22-9 — 1 
4 XB y 4 V2% : 
(4) . + 7 + 3 - 


Proceeding formally and retaining terms in 1/N we obtain 


n\} ue , 4+ 
ae = l es =e comenencee 
(*) ( \/2N ° 12N 


and multiplying by the expression for r given by equation (3) we find the desired 
expansion for X. 








, _ a8 | Brig + 10 

(6) nore (1 Van * sae 10). 

Recall that both r,, and 2: are readily obtainable from tables of the normal 
curve; in fact, 7,, is defined by 


adie [ * 4 t/2 
Jon 1,“ 

A comparative table of approximate and exact values of \ is given in Table 1° 
From the table we see that for N = 800 the error is less than 1 in the 4th sig- 
nificant figure, and for N = 160 the error is less than 1 in the 3rd significant 
figure within the limits of 8 and y considered. The approximation will be less 
exact for higher values of 8 and y. 


1 t-e_,, 
dt = y and 2-4 is defined by wal e’ "dt =1-—-—B8. 
T J—ao 


REFERENCES 
{1] A. WaLp anp J. Wo.trow1tTz, ‘‘Tolerance limits for a normal distribution,’’ Annals of 
Math. Stat., Vol. 17 (1946), pp. 208-215. 
[2] C. M. Tuompson, ‘‘Tables of percentage points of the x? distribution,’’ Biometrika, 
Vol. 32 (1941-42), pp. 188-9. 
[3] Henry GOLDBERG AND Harriet LEvINgE, ‘“‘Approximate formulas for the percentage 
points and normalization of t and x?,’’ Annals of Math. Stat., Vol. 17 (1946). 


THE PROBABILITY DISTRIBUTION OF THE MEASURE 
OF A RANDOM LINEAR SET 
By Davip F. Votaw, Jr. 
Naval Ordnance Laboratory 
1. Introduction. Consider a random sample 0,(2,, ---, 2.) of n values of a 


one-dimensional random variable xz having cumulative distribution function 
F(z). Let there be associated with each z an interval of length D centered at x 


al 


of 
a, 


ze 
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(D a positive constant). Let S(0,) denote the random set which is the point-set 
sum of the n intervals associated with 0, ; S(0,) isa set of one or more intervals. 
Let S denote the measure of S(0,) (S is the sum of lengths of the intervals 
composing S(0,)). Given F, n and D, what is the probability function of S? 
This note contains a solution of the problem for F(x) = z, (0 < x < 1); the case 


of F(x) = I He“ dt, (0 < x < «;H > 0), is also treated. 


2. Sampling from a uniform distribution. Let y = S — D. The range of 
yisO < y < m, where m denotes the minimum of 1 and (n — 1)D. Let x, 


-++, 2, be the sample values arranged in increasing order of magnitude. Make 
the transformation 


Yo = 21 
(2.1) 


Yi = Li4r — UH, (@ = 1, -+-,n — 1). 
n—1 
y can be expressed as >, m(y;, D), where m(y;, D) denotes the minimum of 
t=] 
n—1 
y:and D. The probability function of (yo, y1, --*; Yn) is n! [[ dy, (v. > 0; 
u=0 


n—1 
7. < ») If m = (n — 1)D, then y = (n — 1)D if and only if y; > D, i = 1, 
u=x0 


--+,m — 1); for a fixed y it can be shown by use of the Dirichlet integral that 
the volume of the (n — 1) dimensional region in which any point (yo, 1, «°°, 
n—1 
Yn-1) satisfies this condition is Ao ee It follows that: 
n — 1)! 


I 


1—(n—1) D 
Pr jy = (n — 1)D} n | (1 — yo — (n — 1)D]"~ dy 
(2.2) y 


= [1 — (n — 1D)", ((n — 1)D < 1). 


The probability that Y < y < Y + AY (where Y < mand AY denotes an 
arbitrarily small positive increment in Y) can be evaluated by determining 
volumes of certain regions contained in the tetrahedron defined by y. > 0, 


> y < 1. Consider the following conditions: 
u=0 


(a) @~D< Y < (q+ 1D (q = 0,1, ---, M; M denotes the minimum 
of (n — 2) and the greatest integer less 
than ») 

D > 


7 
(c) Dy <1—y—yt QD, 


u=( 


(d) Yo 


A 


D (ve =jtil,---,n—1). 
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The probability that Y < y < Y + AY and that (b), (c) and (d) are satisfied is; 
Y+AY 1l—y dy 
2.3) ni{ By) | Ay, 0) dy 5, 
y=Y “yo=0 += "7" 1 
where A ;(y, yo) denotes the 7 dimensional volume of the region in which any 
point (yi, ---, y,;) satisfies (b) and (ec), and B,(y) denotes the (n — j — 2) 
n—1 
dimensional volume of intersection of the hyperplane >> y, = y — jD with an 
v=j+1 
(n — j — 1) dimensional cube (0 < y, < D). It is clear that if any other of 
the r ; ') combinations of j y’s out of the set of (n — 1) y’s had been specified 


in (b) and the (n — 7 — 1) complementary y’s had been specified in (d), the 
corresponding A ;(y, yo) and B,(y) would be equal to those given in (2.3); hence 


q i - 3 Y¥+AY 
PriY<y <¥ +a} =n ( )f B,(y) 


j=0 =¥ 


(2.4) oe dy 
fo A;(y, yo) dyo warre? 
qD< Y<(q+)D,¥Y <m, (q=0,1,---, M). 
Aja, w) = 2 ==", and (see [1] and (2) 
@.5) Bly) = Y®—T—NE (ay ("7") y - DG tn. 


From (2.4) and (2.5) it follows that the probability function of y, say f,(y), is: 


hs n-—1\(/n-1 
nd = EE oS |G Ea) 


_ (" - ; - ‘a —y)"Yy -DG+nI", 


qgD<y<(aqt)D, q=90,--:,™), y<m. 


fn(y) is not defined at (n — 1)Dif (n — 1)D < 1 (see (2.2)); if m = 1, the range 
of definition of f,(y) as given in (2.6) is y < 1. 

The cumulative distribution function of y is continuous with the exception, 
in the case of (n — 1)D < 1, of a saltus of amount [1 — (n — 1)D]" at y = 


(n — 1)D (see (2.2)). The probability function f,(y) is continuous over the 
range 0 < y < m with the exception, in the case of n > 3 and (n — 2)D < 1, 
of a simple discontinuity at y = (n — 2)D. 


For n = 2 and D < 1, 


fly) = 201 — y), O<y < D), 





is: 
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and Pr{y = D} = (1 — D)’. 
For n = 3 and{2D < i, 


fly) = 6(1 — y)y, (0<y<D), 
fay) = 61 — yy — 120. — y)(y — D) + 61 — y)”, (D]< y < 2D), 


and Pr {y = 2D} = (1 — 2D)’. 
The expected value, say E(y), of y is: 





(n — 1) n+l 2 
EQ) = — 7,0 - a-D)* (D <1); 
(2.7) ’ 1) 
_ n a 
“Eri (D > 1). 


The expected value of S is D + E(y). E(y) can be derived by use df (2.6) 
or by use of a theorem of Robbins [3]. 


3. Probability that random linear set covers range of variate. Given that 
F(z) = x, (0 < x < 1), and nD > 1, what is the probability, say ,Pp, that 
S(0,.) contains the interval (0 < « < 1)? If D < 1, the interval is covered 
if and only if (i), (ii) and (iii) below are all satisfied: 


(i) yu < D, (u=1,---,n—1), 
n—1 

Gi) Tu >(1-w-9), 
us 2 

= D 

(iii) yw 35° 


nPp can be expressed as ee 


- lL an OO Tye 


where C,_:(z) (see [2]) denotes the (n — 9) dimensional volume of the intersection 
n—1 
of the hyperplane >> y, = z with an (n — 1) cube 0 < y, < D. It follows from 


u=1 


(2.5) and (3.1) that 
(al n— 1 
nPp = > (—1)" (" u Ja sg -_ 





u= 


{(1/D)—}] = n 
(3.2) —2 ys (—1)” (’ alt — uD — 2) 
u=0 ui, 2 


a. {(1/D)—1]} } on iy ef : 
t+ > (-1) , ja uD — Dd)’, 


u=V 


where D < 1 and [zx] denotes the greatest integer less than x. If 1 < D < 2, 


D 
Po =1-2(1 ) 
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4. Sampling from F(z) = [ He dt, (0 <2 < ©;H>O). If F(z) = 
0 


[ He “‘ dt, the probability function of S can be determined but is very cumber- 
0 


some in the form in which it is known to the writer. The characteristic function, 
say g(0), of the probability function of S will be given instead. By use of (2.1) 
it ean be shown that: 
; n—1 ( eee = sl NH) 
4.1 (0) = '”* ped ont 
where i = 4/—1. 
The expected value, E(S), and variance, o., of S are: 


> n—1 (1 tees —DHd) 
E(S) = ‘. Es 
(4.2) a —2DHX 9 n—1 -DHX 
e Ss er Fe 
” H? ian) x H A=1 ny 
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INFORMATION GIVEN BY ODD MOMENTS 


By EpmunD CHURCHILL 
Rutgers University 


The widespread use of the third moment about the mean as a measure of skew- 
ness and the belief engendered by this use that a distribution is symmetric if its 
third moment is zero prompt the question of how much information about a 
distribution can be deduced from a knowledge of its odd moments. An answer 
to this question is: Let F(x), a cumulative distribution function; {uni}, (n = 1, 
2, ---), a sequence of real numbers; and « > 0 be arbitrary. There exists a c.d.f., 
F*(x), having as odd moments the terms of the given sequence and such that 


(1) | F(x) — F*(x) | < ¢, all x. 


If the mean of F(x) is equal to uw; and the variance of F(x) is not zero, it can be 
shown that F*(x) may be chosen so that in addition the variance of F*(x) is 
equal to that of F(z). 

An immediate consequence of our statement is that a distribution need not be 
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symmetric even though all its odd moments vanish. Such an asymmetric distri- 
bution, due to Stieltjes, is given by: 


(2) dF(x) = 1/48 e7'#!*. (1 — ksin ||*) dz, -~xe <xa<ow,k= —-lifz <0, 
k=1lifx#>0. 


The proof of our statement will follow easily from the following: 

LemMMA. Let {men-1}, a sequence of real numbers be given. There exists a c.d.f. 
having as odd moments the given numbers. 

We construct a sequence {H,} of increasing step-functions in such a manner 
that for every n, the first n moments of H,, are the first of the given numbers, 
and such that this sequence converges to a monotone function having all the 
desired moments. A slight modification of this function will give the desired 
c.d.f. 

Let Ho be identically zero. We form H, by adding to Ho a jump or mass of 3 
at x = 2m,. In general, H; is formed from H;_; by adding to it k masses chosen 
so that their first (k — 1) odd moments are zero and so that the kth odd moment 
of H;. is mx-1. This we do by adding the masses | x; |, (j = 1, 2, --- , k), at the 
points e;jp where the x,;’s are the solutions of: 


px + 2prte +--+ + kpx, = 0 
pxt+ (2p)r2+---+ (kp) =0 
px + (2p) x2 + +++ + (kp) “x, = 0 
pay + (2p) 42 + ++ + (kp) ae = mora — m(Hi-), 


m(H;._1) is the kth odd moment of H;-1 , e; is the sign of x; and p is a parameter. 
Since the determinant of this system is a Vandermonde determinant, there exists 
a unique set of solutions for every non-zero value of the parameter. The masses 
thus chosen clearly have the specified moments. Eliminating p from the left 
sides of the equations by division, it is apparent that the z,’s are all linear func- 
tions of p ““"”. Thus we may choose 7p so large that the sum of the masses 
added at this step does not exceed 1/2°. The absolute odd moments of orders less 
than (2k — 1) of these k masses are also linear functions of negative powers of p. 
We may thus insure by further increasing our choice of p that the (2k — 1 — 2r)th 
absolute moment of H;, does not exceed the corresponding moment of Hi by 
more than 1/2’. For definiteness, we choose p as the smallest number satisfying 
these requirements. 

The first of these restrictions on p insures that for each value of x, the sequence 
H,(2) is increasing and bounded from above by one. The sequence of functions 
thus converges to a monotone function H*(z) with the property that H*(— @) 
= 0, H*(«) <1. The other restrictions on p insure that the sequences of abso- 
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lute odd moments of all orders are uniformly bounded, a bound for the abso - 
lute moments of order 2k — 1 being one greater than the absolute moment of 
this order of H;,. This in turn insures that the odd moments of H*(x) exist and 
that they have the desired values. By adding a jump of 1 — H*(«) at the 
origin we obtain H(x), a c.d.f. with the given odd moments. 

The main statement of this note is an immediate consequence of the lemma. 
Let the kth odd moment of F(x) be M2,_1 , which we assume to be finite, and let 
the sequence {m1} be defined by the relationships: 


para = (1 — €)Mox-1 + Emax, (k = 1,2,---). 
Let H(x) have the m’s as odd moments. The c.d.f. F*(x) defined by 
F*(x) = (1 — 6) F(x) + €H(z) 


clearly has the properties stated above, and our statement is proved. If the 
moments of F(z) are not all finite, the proof will need only minor modifications. 

If one asks in addition that F* have a finite range, F* will, in general, not 
exist. If, for example, the range of F is finite and its odd moments are zero, 
then F must be symmetric about the origin, for F* defined by dF*(x) = dF(—2z) 
would have the same momentsas F. But a c.d.f. with finite range is determined 
by its moments; hence F(x) = F*(z). 


SOME ORDER STATISTIC DISTRIBUTIONS FOR SAMPLES 
OF SIZE FOUR 


By Joun E. WaLsH 
Princeton University 


1. Summary. Let x, x2, 23 , x, represent the values of a sample of size four 
drawn from a normal population. There is no loss of generality in assuming 
that the distribution function of this population has zero mean and unit vari- 
ance. Denote it by V(0,1). Let 2) be the ith largest of x, 72,23, 24. The 
purpose of this note is to determine the joint distribution of 


tay + re — Xe — Xa, a — Le + Le — Tw , andr — X@ — X%e +7, 
and derive from this joint distribution the joint distributions of these statistics 
taken in pairs, also the distribution of each statistic itself. 
2. Analysis. Consider the joint distribution of 
Ty = 3(%4 + 23 — To — 21) 
i = 3 (a4 — %3 + t2 — 1%) 


rs = 4(24 — Ie = De a 21). 


\w 


— 


as 
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Evidently, 


E(r) =0, (@@ =1,2,3). Err) =0, (43). E(r?) = 1. 


Hence the 7; are independently distributed according to N(0, 1). 

Let v; be the jth largest of | 7: |, | 72 |, | 73 |. Then by first finding the joint 
distribution of | 7: |, | 72 |, | 73 | and then applying the distribution for order sta- 
tistics [1], it is easily seen that the joint distribution element of v; , v2 , v3 is 

A8f (v1) f (ve) f(v3)dvydvedv; , 
where 


oe 
Io) = Vax° , OSndm<wu. 


- — 


Examination shows, however, that 


v3 = 3 (24) ao X(3) — Le) — xy) 
» = 4 (24) — X%@ + Le) — Lay) 


~~ 
= 
| 
rol— 


| Li) — Ley — La) + (1) | 
Let 


Ms = Xa + Xe) — Le) — La) 
Me = Lu) — Bay + Lea) — La) 
m = Lu) — La — Leo t+ La. 
Then the joint distribution element of |m; |, m2 and mz; is 
6f(3 | mi |)f(Fme)f(4ms3)d | my | dmedms . 


Since the function f is symmetrical about the origin, it follows immediately that 
the joint distribution element of m; , m2. and mz; is 


3f(3m1)f(Fme)f(msz)dmidmedm; , 


where | m,| < m < m3. 


3. Derived results. By taking marginal distributions it is found that the 
joint distribution elements of m; , m. and m; taken in pairs are 


gi(m , m2)dm, dmz = 3 (f S(y)dy ) f(3mi)f(Fme2)dm, dmz . 
m2 


g2(m; , m3)dm, dms 


II 


3 ( fidy)dy) f(am)f(3ms)dm dmg . 
| | 


g3(Me2, m3)dm2 dm3 = 6 (/ : f(dy)dy )f(Ama)f(dma dm dm; . 
0 
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The distribution elements of m; , m2 and ms; are seen to be 


gi(m)dm, = 2 a(f- Jeuav) f(3m)dm . 
g2(m2)dm: = 6(f" F( Guidu)( [ “Gbuddy fam 2)dme . 
g3(m3)dm3 = 3([" “sau)ay) f(3ms)dms . 

It is to be noted that if a > 0, 


oo 3 
PrO0 < m <a) = Pr(-a <m <0) =}-4 (f fty)dy) ‘ 
a/2 


12 ( l “ fly)dy) — 16 ( ‘ fly)ay) . 
8 ( I  seaddy) , 


so that the probability that any of m1, mz, ms lie between two given numbers 
is expressed explicitly and can be calculated with the aid of standard tables for 
the normal distribution. 


Pr(0 < m: < a) 


Pr(0 < m3 < a) 


4. Generalization of method. The method used to obtain the joint distribu- 
tion of the order statistics m, , m2 and m; was to take all possible combinations of 
4 variables with two plus and two minus signs (except for factor of —1) and 
show that these combinations behave as normally distributed independent 
variables. The question arises as to whether this method of finding order sta- 
tistic distributions would apply in general to 2n variables with n plus and n 
minus signs. It is easily proved that this will occur only when n = 2. 


REFERENCES 
{1] S. S. Witxs, Mathematical Statistics, Princeton Univ. Press, 1943, p. 90. 
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NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Institute of Statistics of the University of North Carolina 


Announcement of detailed plans for the North Carolina All-University Insti- 
tute of Statistics has been made by Professor Gertrude M. Cox, Director of the 
Institute. 

To provide graduate-level training for students in statistics and to combine 
the theoretical or mathematical statistics with applied or experimental statistics, 
a Graduate Department of Mathematical Statistics is being set up at Chapel 
Hill with Professor Harold Hotelling as Head. The existing Department of 
Experimental-Statistics at Raleigh is a part of the Institute, and will be headed 
by Professor Gertrude M. Cox with Professor W. G. Cochran as Director of 
Research. Professors Hotelling and Cochran will be Associate Directors of the 
Institute. 

Professor Hotelling, who will head the Department at Chapel Hill comes to 
North Carolina from Columbia University, where he has been directing its 
graduate mathematical statistics program. Previously, he had held positions 
with the University of Washington, Princeton University and Stanford Uni- 
versity. His undergraduate training was taken at the University of Washington 
where he majored in journalism; his Master of Science degree was awarded by 
the same institution in mathematics; and his doctorate by Princeton University, 
also in mathematics. In addition, he has done some graduate work at the Uni- 
versity of Chicago. Professor Hotelling’s publications in mathematical statistics 
are numerous and well known. Among the members of his staff will be a visiting 
professor, M.S. Bartlett, on leave of absence from Cambridge University. A 
graduate of Cambridge and native of England, Bartlett has also held positions 
with the University of London and the Imperial Chemical Industries, and during 
the war was engaged in war research in London. 

In addition, P. L. Hsu, William Madow, and Herbert Robbins, will be mem- 
bers of the Department at Chapel Hill as associate professors. Hsu, a native 


of China, has held teaching positions with the University of Peking and the Uni- ~ 


versity of London. He received his degrees from Tsinghua University and 
the University of London. 

Madow is now in Brazil, where he is serving as a visiting professor of statistics 
at the University of Sdo Paulo. He received his training, both undergraduate 
and graduate, from Columbia University, and has worked with the Department 
of Agriculture Graduate School and the Bureau of the Census in Washington. 

Robbins will come to the University of North Carolina from New York Uni- 
versity where he has been serving as an assistant professor. Prior to that, 
he was a staff member of the postgraduate school of the U. 8. Naval Academy, 
and an instructor in mathematics at New York University and at Harvard 
University. He holds A.B., A.M. and Ph.D. degrees from Harvard University. 
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The appointment of Edward Paulson as an instructor completes the initial De- 
partment staffat Chapel Hill. A graduate of Brooklyn College and holder of an 
M.A. degree from Columbia University, Paulson has been more recently study- 
ing mathematical statistics at Columbia under a pre-doctoral fellowship of the 
National Research Council. 

Professor Cochran came to North Carolina in March from Ames, Iowa, 
where he had been serving as professor in the statistical laboratory of Iowa 
State College. During the war years he was sent to England, Germany, and 
Austria on special work for the War Department, after spending a year at 
Princeton University where he served as research statistician on war work. A 
native of Glasgow, Scotland, Cochran has been in the United States since 1939, 
and is a naturalized citizen. Before coming to America, he was employed as 
statistician with the Rothamsted Experimental Station in England. Cochran’s 
publications in both the theory of statistics and applied statistics are well 
known, as is his experience with practical research problems. He is serving this 
year as president of the Institute of Mathematical Statistics. He is a fellow 
of the American Statistical Association and a fellow of the Royal Statistical 
Society of England. 

Under the plans of the Institute, students who are preparing to teach statis- 
tics or to develop statistical theory will take most of their training at Chapel 
Hill. However, work between the two branches will be so coordinated as to 
include instruction in the application of statistics as taught in Raleigh. 

For students who intend to become statistical consultants in various other 
fields, basic training will be taken in mathematical statistics, with the main part. 
of the advanced applied training at Raleigh. 

For research students, on both campuses, who are working in other sciences, 
jucluding agriculture, biology, medicine, psychology, sociology, economies, in- 
dustry, and textiles, training in both basic and applied statistics will be given. 

Working with Cochran in Raleigh are Professor J. A. Rigney; Associate 
Professors R. L. Anderson, J. M. Clarkson, H. L. Lucas, and Paul Peaeh; 
Assistant Professor H. F. Robinson; Instructors Margaret Fleming, R. J. Monroe 
and Sarah Porter. 

Collaborators working with the Raleigh unit are A. L. Finkner, W. A. Hen- 
dricks and F. E. MeVay of the Bureau of Agricultural Economics; C. E. La- 
moureaux and G. P. Weber of the Weather Bureau; and D. D. Mason of the 
Bureau of Plant Industry. 


Ce RR a 


Joint Session of the Institute and Section A of the AAAS 


A joint session of the Institute of Mathematical Statistics and Seetion A of 
the American Association for the Advancement of Seience was held in the 
Municipal Auditorium at St. Louis on Saturday, Mareh 30, 1946. at 2700 P.M. 


At this session invited addresses were given by Lieutenant Commander Joho. EH. 
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Curtiss on Siatistical Inference and its Engineering Applications, and by Mr. 
Morris H. Hansen on Some Sampling Problems in Surveys of Business and 
Population. 


(eR 


Personal Items 


Dr. Paul H. Anderson is at present Economic Analyst with the War Assets 
Corporation at Washington. He is also teaching mathematics in the evening 
school of American University. 

Assistant Professor T. A. Bancroft has returned from a teaching position 
at the University Study Center at Florence, Italv, to his position at Towa State 
College. 

Associate Dean Walter Bartky of the University of Chicago has been appointed 
Dean of the Division of Physical Sciences. 

Mr. Gordon L. Beckstead in working toward his doctorate in statisties at the 
University of California. 

Mr. Donald Cody has returned to his position as Assistant Actuary at the 
Equitable Life Assurance Society after spending three years in war research 
with the NDRC, the Naval Ordnance Station at Indianapolis, and the Naval 
Ordnance Station at Inyokern, California. 

Protessor Allen 'T. Craig, after war service at the Postgraduate School of the 
U.S. Naval Academy at Annapolis, has returned to his position at the University 
of lowa. 

Mr. James H. Davidson is studying for his doctorate in chemistry at Princeton 
University. 

Associate Professor J. L. Doob of the University of Illinois has been promoted 
to a professorship. 

Assistant Professor Churchill Eisenhart of the University of Wisconsin has 
been promoted to an associate professorship. 

Dr. Wayne Gutzman recently discharged from the Navy as Lieutenant, has 
assumed his new duties as Assistant Professor of Mathematies at the Postgradu- 
ate School, Naval Academy, Annapolis, Maryland. 

Mr. Bernard Hecht has been discharged from the Army and is now Chief 
Quality Control Engineer with the International Resistance Company at Phila- 
delphia. 

Dr. D. G. Humm has been elected president of the Southern California Acad- 
emy of Criminology. 

Mr. Amrom H. Katz is in charge of a group of physicists, engineers, and aerial 
photographers representing the Aerial Photographic Laboratory at Wright 


field, which will record photographically various aspects of the forthcoming 


atomic bomb test at Bikini Island. 

Mr. Edward A. Lew has ben released from active duty and has returned to 
his former position as Assistant Actuary of the Metropolitan Life Insurance 
Company. 
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Dr. FE. V. Lewis is Junior Research Associate with E. I. duPont de Nemours 
at the Nylon Research Laboratory at Wilmington. 

Associate Professor M. C. MacPhail of Acadia University, Wolfville, Nova 
Scotia, has been promoted to a professorship. 

Mr. C. J. Maloney has been appointed to an instructorship in the department 
of mathematics at Iowa State College. 

Dr. Edward B. Olds is director of the Research Bureau of the Social Planning 
Council of St. Louis and St. Louis County. 

Dr. A. M. Peiser has been appointed head of the Statistics Research Group 
at the Langley Field Laboratory of the National Advisory Committee for Aero- 
nautics. 

Mr. Robert J. Saunders has been released from the Army and is now connected 
with Mohawk Carper Mills at Amsterdam N. Y. 

Mr. Benjamin Stauber is now Chief of the Relocation Planning Division, War 
Yelocation Authority. He has transferred from the Department of Agriculture 
for this work. 

Mr. Arthur I. Sternhell returned from the Army to his position as general 
staff assistant in the Field Management Division of the Metropolitan Life 
Insurance Company. 

Mr. Harry Weingarten has been appointed Tutor of Mathematics at the Col- 
lege of the City of New York. 

Assistant Professor J. R. Vatnsdal has finished his army service and has 
returned to the State College of Washington where he was promoted to an asso- 
ciate professorship. 

Mr. Bertram Yood has completed his duty in the navy and is now at Yale 
Station, Connecticut. 

A symposium on mathematical statistics and probability was held at the 
University of California at Berkeley, January 28-30, 1946. 


ee nt = Re 


New Members 


The following persons have been elected to membership in the Institute: 

Alchian, Prof. Armen A., Ph.D. (Stanford) Univ. of Oregon, Capt. (A.C.) Hq. AAF 
Training Command, Ft. Worth, Texas 

Bingham, M.D. 1920S St., N. W., Washington, D. C. 

Cannon, Edward W., Ph.D. (Johns Hopkins) Comdr., US Navy, Research and Standards 
Branch of Bureau of Ships, Cannon, Delaware 

Carvalho, Prof. Pedro Egydio, Ph.D. (Sio Paulo) Univ. de Sao Paulo, Faculadade de Hi- 
giene, Avenida Dr. Arnaldo 85, Caiza postal 99-B, Sao Paulo, Brazil 

Delsa, Alexis, A. I. Lg. (Liege) Mgr. Basic Bessemer Steelworks, Société Anonyme John 
Cockerill, Seraing, Belgium 

Duncan, David Beattie, B.SC. (Sydney) Graduate Student, Iowa State, Statistical Labora- 
tory, Ames, Iowa 

Froelich, Kathryn, B.A. (Evansville) Statistician, US Dept. of Agriculture, Bureau of 
Human Nutrition and Home Economics, 1806 Monroe St., N. W., Washington 10, D.C. 

Goldstine, Herman H. Ph.D. (Chicago) Institute for Advanced Study, Princeton, N. J. 
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Hammond, Edward Cuyler, Sc.D. (Johns Hopkins) Major A.C., US AAF, Chief, Statistics 
of Flying Personnel Branch, Office of the Air Surgeon, 4700 Connecticut Ave., Washington, 
D.C. 

Hsu, Prof. Pao-Lu, Ph.D. (London) Columbia University, 1027 John Jay Hall, Columbia 
Univ., New York City 

Kyle, Garland Dean, M.S. (Michigan) Spectroanalyst, Physicist (US Navy)5848 Filbert, 
Philadelphia 39, Penn. 

Leibler, Richard A., Ph.D. (Illinois) Instructor, Purdue Univ., Math. Dept., Lafayette, 
Indiana 

Lessard, Prof. Roger, C.E. (Montreal) Hull Technical School, Hull, Quebec, Canada 

Mosimann, Thomas F., A.B. (Charleston) US Bur. Labor Statistics, Regional Employment 
Analyst, 4216 Western Ave., Dallas 11, Texas 

Patte, W. Edmund, B.A.Sc. (Toronto) Stat. Eng., Canadian Industries Ltd., Shawinigan 
Falls, P.Q. Canada, 550—16th St., Almaville 

Piza, Prof. Affonso P. de Toledo, Ph.D. (Sio Paulo) Escola Politechnica, Sao Paulo, Brazil, 
Rua Ministro Godoy, 1123 

Rozen, Daniel I., A.B. (Columbia) Stat., Medical Statistics Div., Office of the Surgeon 
General, War Department, Rm 317-1, 3415 38th St., N. W., Washington, D.C. 

Saidel, Frank, M.A. (Michigan State) Instructor in Math., Michigan State, East Lansing, 
Michigan 

Schmalz, William Herbert, B.Sc.A. (Toronto) Technical Superintendent, Dominion Rub- 
ber Company Limited, Merchants Rubber Factory, 51 Breithaupt St., Kitchener, Ont. 

Stehn, John R., Ph.D. (Wisconsin) Physicist, Research Division, Winchester Repeating 
Arms Co., New Haven, Conn. 

Tsao, Prof. Fei, Ph.D. (Minnesota) National Central University, Chungking, China 

Weaver, Chalmers L., B.S. (Kent State) Asst. Actuary, New England Mutual Life Ins. 
Co., 501 Boylston St., Boston, Mass. 

Weber, C. Jerome (New York) Personal Trust Officer, The Chase National Bank of the 
City of New York, 11 Broad Street, New York City, Chappaqua, New York, Box 63 
Whitney, Donald Ransom, M.A. (Princeton) Grad. Asst., Math. Dept., Ohio State Univ., 

Columbus, Ohio 

Wright, C. Ashley, M.A. (Princeton) Econ. Stat., Standard Oil Company, N.J., Box 34, 
RFD 6, Alexandria, Va. 

Yost, Earl K., Jr., B.S. (Washington and Jefferson) Grad. Asst., Math., Univ. of Oklahoma, 
843 College Ave., Norman, Okla. 
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REPORT ON THE APRIL MEETING OF THE WASHINGTON 
CHAPTER OF THE INSTITUTE 


A meeting of the Washington Chapter of the Institute of Mathematical] 
Statistics was held at George Washington University, Washington, D. C., 
on Friday and Saturday, April 12 and 13, 1946, in conjunction with a meeting 
of the Washington Chapter of the American Statistical Association. 

More than 100 people attended the meetings including the following 51 mem- 
bers of the Institute: 


Theodore W. Anderson, Jr., Richard O. Been, Archie Blake, David Blackwell, J. B. Bod- 
die, Glenn W. Brier, William Cohen, Jerome Cornfield, John H. Curtiss, Bessie B. Day, 
Robert Dorfman, Thomas I. Edwards, Andrew Fraser, Meyer A. Girshick, Clyde H. Graves, 
Margaret J. Hagood, Major Edward C. Hammond, Morris H. Hansen, Alston 8. Householder, 
Leonid Hurwicz, Irwin FE. Jackson, Jr., Walter Jacobs, Hyman B. Kaitz, H. 8. Konji, Lila F. 
Knudsen, Colonel 8S. Kullback, R. B. Ladd, H. G. Landen, Walter Leighton, Gerson Levin, 
Jacob E. Lieberman, Sophie Marcuse, Ethelyne L. MeBee, William J. McCabe, Francis 
McIntyre, Dorothy Morrow, H. W. Norton, W. R. Pabst, Carl J. Rees, David Rosenblatt, 
M. Sandomire, Edward M. Schrock, L. W. Shaw, John H. Smith, Frederick F. Stephan, 
F. M. Wadley, A. Wald, F. M. Weida, Samuel Weiss, 8S. 8. Wilks, C. P. Young. 































The session Friday evening was devoted to the following contributed papers: 


1. Estimation of the Parameters of a Single Stochastic Difference Equation in a Complete 
System. 
T. W. Anderson and H. Rubin, Cowles Commission for Keonomie Research 
M. A. Girshick, Bureau of Agricultural Meonomics 
Presented by T. W. Anderson 
2. Estimation of Linear Functions of Cell Proportions 
J. H. Smith, Bureau of Labor Statistics 
3. On Functions of Sequences of Independent Chance Vectors with Applications to the 
Random Walk Problem in k dimensions. 
D. Blackwell, Howard University 
M. A. Girshick, Bureau of Agricultural /:conomics 
Presented by D. Blackwell 
4. The Exact Power Curve and Distribution of n for the Sequential Binomial Probability 
Ratio Test. 
M. A. Girshick, Bureau of Agricultural Heonomics 





At a business meeting following the session of contributed papers, Professor 
F. M. Weida and Dr. John H. Smith were elected to suce¢eed Colonel Kullback 
and Dr. Madow as members of the Program Committee. 

The program for Saturday morning was devoted to the following invited 
lectures: 


1. Recent Developments in the Measurement of Simultaneous Economic Relations. 
T. Koopmans, Cowles Commission for Economie Research 

2. Structural Estimation versus Regressions: use for Policy and Prediction. 

Leonid Hurwiez, Cowles Commission for Economic Research 
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MEETING OF WASHINGTON CHAPTER 


The program for Saturday afternoon was devoted to the following: 


1. Basic Concepts Underlying Sequential Analysis with Applications. 
A. Wald, Columbia University. 


2. Applications of Sequential Analysis to Acceptance Inspection. 
W. R. Pabst, Navy Department 


Irving Siegel, Veterans Administration, was chairman for the morning session 
and Professor F. M. Weida, George Washington University, for the afternoon 
session. 


A lively discussion followed the presentation of the papers. 
S. KuLLBACK, 


Secretary, Washington Chapter. 





